CN117522881B

CN117522881B - Cardiac image segmentation method based on attention mechanism and multi-level feature fusion

Info

Publication number: CN117522881B
Application number: CN202311461592.2A
Authority: CN
Inventors: 陈长芳; 翟纯琳; 舒明雷; 周书旺; 孔祥龙; 朱喆
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-06-18
Anticipated expiration: 2043-11-06
Also published as: CN117522881A

Abstract

A heart image segmentation method based on an attention mechanism and multi-level feature fusion relates to the technical field of image segmentation, and adopts a symmetrical structure of a decoder-encoder mode. The dense cascade module is designed, the cavity convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. The multi-level gating fusion module is added in the decoder part, so that the contribution of different level characteristic diagrams can be automatically adjusted, multi-level information is fully utilized, and better prediction is realized.

Description

Cardiac image segmentation method based on attention mechanism and multi-level feature fusion

Technical Field

The invention relates to the technical field of image segmentation, in particular to a heart image segmentation method based on an attention mechanism and multi-level feature fusion.

Background

In recent years, the development of a deep learning method has profound effects on the field of heart image segmentation, so that the heart image segmentation is more accurate, efficient and self-adaptive, the manual workload is reduced, and the images with high resolution, high contrast and high signal to noise ratio in any direction are provided. According to the segmentation result, the indexes such as myocardial quality, myocardial thickness, ejection fraction, ventricular volume and the like can be effectively obtained, so that accurate segmentation is particularly important. However, accurate segmentation is a challenge because of non-uniformity in magnetic field strength, artifacts are easily created during imaging, boundary blurring is caused, and the anatomy of the heart is complex.

With the advent of deep learning and the advent of convolutional neural networks, rapid, high-precision and high-reliability are becoming criteria for image segmentation. Of these, U-Net, and its variants, have been used by many researchers for cardiac MRI segmentation, with the appearance of U-Net being the most important, and have become the basis for image segmentation. There is still a need for further improvements such as the inability to integrate global information while downsampling may lose spatial data. This is particularly disadvantageous for segmentation of medical images, as medical image segmentation typically requires extensive context details.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a heart image segmentation method based on an attention mechanism and multi-level feature fusion, which fuses input global information, can effectively enhance the robustness of features and the local connection between the features.

The technical scheme adopted for overcoming the technical problems is as follows:

a heart image segmentation method based on an attention mechanism and multi-level feature fusion comprises the following steps:

a) Acquiring a cardiac MRI image dataset X, x= { X ₁,X₂,...,X_i,...,X_N }, wherein X _i is the i-th cardiac MRI image, i e { 1..once, N }, N being the number of cardiac MRI images;

b) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X';

c) Dividing the preprocessed data set X' into a training set, a verification set and a test set;

d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F _i, i epsilon {1, …, M };

e) Establishing a segmentation network model formed by an encoder and a decoder;

f) Inputting the ith slice image F _i into an encoder of the segmentation network model, and outputting to obtain a feature map A _5-i;

g) Inputting the feature map A _5-i into a decoder of the segmentation network model, and outputting a segmentation result image P _i;

h) Training a segmentation network model to obtain an optimized segmentation network model;

i) Slicing each preprocessed cardiac MRI image in the test set along a Z axis to obtain Q slice images, wherein the ith slice image is F _i', i epsilon { 1.. The first place, Q };

j) The i-th slice image F _i 'is input to the optimized segmentation network model, and the segmentation result image P _i' is output.

Preferably, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).

Further, step b) comprises the steps of:

b-1) converting the ith cardiac MRI image X _i into an array Numpy by utilizing GetArrayFromImage () function in numpy library, and cutting the i cardiac MRI image X _i converted into an array Numpy into V2D slices along the Z axis direction;

b-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library

B-3) MRI images of the heartClockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and carrying out normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' _i;

b-4) N preprocessed cardiac MRI images constitute a preprocessed dataset X',

X＝{X′₁,X′₂,...,X′_i,…,X′_N}。

Preferably, in step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to a ratio of 7:1:2.

Preferably, M in step d) takes a value of 1312.

Further, step f) comprises the steps of:

The encoder for dividing the network model comprises a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module;

F-2) the first dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F _i is input into the first dense cascade module and output to obtain a feature map A _1-i;

f-3) inputting the characteristic diagram A _1-i into a first maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' _1-i;

f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the feature map A' _1-i is input into the second dense cascade module, and the feature map A _2-i is obtained through output;

f-5) inputting the characteristic diagram A _2-i into a second maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' _2-i;

f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the characteristic diagram A' _2-i is input into the third dense cascade module, and the characteristic diagram A _3-i is obtained through output;

f-7) inputting the characteristic diagram A _3-i into a third maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' _3-i;

f-8) a fourth dense cascade module of the encoder sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, wherein a characteristic diagram A' _3-i is input into the fourth dense cascade module, and a characteristic diagram A _4-i is obtained through output;

f-9) inputting the characteristic diagram A _4-i into a fourth maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' _4-i;

f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A ₄′_-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map the characteristic mapInputting the characteristic map A ₄′_-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A ₄′_-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A _5-i.

Preferably, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.

Further, step g) comprises the steps of:

g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module; g-2) inputting the feature map A _5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C _5-i; the fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ₄′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G _a is output and obtained, the feature map A ₄′_-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, Outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G _m, adding the feature map G _a and the feature map G _m, multiplying the feature map G ₄′_-i by elements to obtain a feature map A ₄″_-i, inputting the feature map C _5-i into a fifth upsampling layer of a decoder to obtain a feature map C ₅′_-i, and performing splicing operation on the feature map A ₄″_-i and the feature map C ₅′_-i to obtain a feature map D _4-i; the fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D _4-i is input into the fourth double-convolution module, and the feature map C _4-i is output and obtained;

g-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ₃′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G _a 'is obtained by outputting, the feature map A ₃′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G _m' is obtained by outputting, the feature map G _a 'and the feature map G _m' are added and multiplied by the feature map A ₃′_-i element by element to obtain a feature map A ₃″_-i, the feature map C _4-i is input into the fourth upper layer of the decoder, the feature map C ₃″_-i is obtained by outputting the feature map A3735, and the feature map C37 _3-i is spliced with the feature map C;

g-6) the third double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _3-i is input into the third double-convolution module and output to obtain a feature map C _3-i;

G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a ₂′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G _a ″ is obtained by outputting, the feature map a ₂′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m ″ is obtained by outputting the feature map G _a ″ and the feature map G _m ″ which are added and multiplied by the feature map a ₂′_-i element by element, the feature map a ₂″_-i is obtained by inputting the feature map C _3-i into the third upper layer of the decoder, the feature map C ₃′_-i is obtained by outputting the feature map C ₂″_-i D, and the feature map C37 _2-i is obtained by splicing the feature map D;

g-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _2-i is input into the second double-convolution module and output to obtain a feature map C _2-i;

G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ₁′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G _a ' "is obtained by outputting, the feature map A ₁′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m '" is obtained by outputting, the feature map G _a ' "and the feature map G _m '" are added, the feature map A ' _1-i is multiplied by element to obtain the feature map A "_1-i, the feature map C _2-i is input into the second decoder, the feature map C _1-i is obtained by outputting the feature map A _1-i, and the feature map C is obtained by splicing the feature map C with the feature map C3798;

g-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, wherein a feature map D _1-i is input into the first double-convolution module, and a feature map C _1-i is output and obtained;

The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C _2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C _2-i ', inputting the characteristic diagram C _3-i into a second upsampling layer, outputting to obtain a characteristic diagram C _3-i ', inputting the characteristic diagram C _4-i into a third upsampling layer, outputting to obtain a characteristic diagram C _4-i ', splicing the characteristic diagrams C _2-i ', C _3-i ', and C _4-i ', sequentially inputting to a first convolution layer and a Sigmoid layer, outputting to obtain a weight matrix G, carrying out singular value decomposition on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C _2-i with the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C _3-i with the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C _4-i with the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2' and W3' and inputting the spliced feature map W1' into a second convolution layer, and outputting to obtain a feature map Z;

g-12) inputs the feature map C _1-i into the first upsampling layer, outputs the feature map C _1-i ', adds the feature map C _1-i ' to the feature map Z, inputs the feature map C _1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P _i.

Preferably, in step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1×1, the step sizes are all 1, and the filling are all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.

Further, step h) comprises the steps of:

h-1) calculating to obtain total loss L _total through a formula L _total＝αL_CrossEntorpy+(1-α)L_Dice, wherein L _CrossEntorpy is a cross entropy loss function, L _Dice is a Dice loss function, and alpha is a weight;

h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L _total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.

The beneficial effects of the invention are as follows:

(1) A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, and a dense cascade module is provided. A position self-attention mechanism is introduced to replace the bottom structure in the model structure, and a dual-path channel attention module is added in the jump connection process. A multi-level gating fusion module is added in the decoder part to fuse the characteristics of different stages.

(2) The encoder consists of four dense cascade modules, wherein the cavity convolution layers with expansion rates of 1, 3 and 5 are cascade connected in a dense mode, the original input of the modules and the output after feature extraction are combined, the receptive field is enlarged, and the transmission of image feature information is enhanced.

(3) The channel attention module weights the feature maps between channels and selects a useful feature map. The position self-attention module in the model bottom structure fuses the input global information, and can effectively enhance the robustness of the features and the local connection between the features so as to realize better prediction.

(4) The multi-level gating fusion module can automatically learn and adjust the contribution of the feature map of each level, the weighting map obtained through dynamic learning can control the proportion of the feature information of each level, and then the features of each level are fused.

Drawings

FIG. 1 is a block diagram of a split network of the present invention;

FIG. 2 is a block diagram of a dense cascading module of the present invention;

FIG. 3 is a block diagram of a dual channel attention module of the present invention;

FIG. 4 is a block diagram of a position self-attention module of the present invention;

Fig. 5 is a block diagram of a multi-level gated fusion module of the present invention.

Detailed Description

The invention is further described with reference to fig. 1 to 5.

a) A cardiac MRI image dataset X, x= { X ₁,X₂,...,X_i,...,X_N }, where X _i is the i-th cardiac MRI image, i e { 1..once., N }, N is the number of cardiac MRI images.

B) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X'.

C) The preprocessed data set X' is divided into a training set, a verification set and a test set.

D) Slicing each preprocessed cardiac MRI image in the training set along the Z-axis, M slice images were obtained, the i-th slice image being F _i, i e { 1..m }.

E) A partitioning network model is built consisting of an encoder and a decoder.

F) The i-th slice image F _i is input to the encoder of the segmentation network model, and the feature map a _5-i is output.

G) The feature map a _5-i is input to a decoder of the segmentation network model, and the segmentation result image P _i is output.

H) Training the segmentation network model to obtain an optimized segmentation network model.

I) And slicing each preprocessed cardiac MRI image in the test set along the Z axis to obtain Q slice images, wherein the ith slice image is F _i ', i epsilon { 1.. The first slice image is Q } and the second slice image is F _i'.

A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, a dense cascade module is provided, hollow convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. A multi-level gating fusion module is added in the decoder part, and can automatically adjust the contributions of different level feature graphs and fully utilize multi-level information.

In one embodiment of the invention, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).

In one embodiment of the invention, step b) comprises the steps of:

b-1) converting the ith cardiac MRI image X _i into an array Numpy using the GetArrayFromImage () function in the numpy library, cutting the i cardiac MRI image X _i converted into an array Numpy into V2D slices along the Z-axis.

B-3) MRI images of the heartAnd (3) clockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and performing normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' _i.

B-4) N preprocessed cardiac MRI images form a preprocessed dataset X ', X= { X' ₁,X′₂,...,X′_i,...,X′_N }.

In one embodiment of the present invention, the preprocessed data set X' is divided into a training set, a validation set and a test set in a ratio of 7:1:2 in step c).

In one embodiment of the invention, M in step d) takes on a value of 1312.

In one embodiment of the invention, step f) comprises the steps of:

The encoder of the f-1) segmentation network model is composed of a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module.

The first dense cascade module of the F-2) encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F _i is input into the first dense cascade module and output to obtain a feature map A _1-i.

F-3) inputting the characteristic map A _1-i into the first maximum pooling layer of the encoder, and outputting to obtain the characteristic map A ₁′_-i. f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the feature map A ₁′_-i is input into the second dense cascade module and output to obtain the feature map A _2-i.

F-5) inputting the characteristic map A _2-i into the second maximum pooling layer of the encoder, and outputting to obtain the characteristic map A ₂′_-i. f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A ₂′_-i is input into the third dense cascade module and output to obtain the characteristic diagram A _3-i.

F-7) inputting the characteristic map A _3-i into the third maximum pooling layer of the encoder, and outputting to obtain the characteristic map A ₃′_-i. f-8) the fourth dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A ₃′_-i is input into the fourth dense cascade module and output to obtain the characteristic diagram A _4-i.

F-9) inputting the characteristic map A _4-i into the fourth maximum pooling layer of the encoder, and outputting to obtain the characteristic map A ₄′_-i. f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A ₄′_-i is sequentially input into the first convolution layer and output to obtain the characteristic diagramMap/>Inputting the characteristic map A ₄′_-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A ₄′_-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A _5-i.

In this embodiment, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.

In one embodiment of the invention, step g) comprises the steps of:

The decoder of the g-1) split network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module.

G-2) the feature map a _5-i is input into the fifth upsampling layer of the decoder, and the output results in the feature map C _5-i. The fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ₄′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G _a is output and obtained, the feature map A ₄′_-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, And outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G _m, adding the feature map G _a and the feature map G _m, multiplying the feature map G ₄′_-i by elements to obtain a feature map A ₄″_-i, inputting the feature map C _5-i into a fifth upsampling layer of a decoder to obtain a feature map C ₅′_-i, and performing splicing operation on the feature map A ₄″_-i and the feature map C ₅′_-i to obtain a feature map D _4-i. The fourth double-convolution module of the g-4) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _4-i is input into the fourth double-convolution module and output to obtain a feature map C _4-i.

G-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a ₃′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G _a 'is obtained by outputting, the feature map a ₃′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G _m' is obtained by outputting, the feature map G _a 'and the feature map G _m' are added and are multiplied by the feature map a ₃′_-i element by element to obtain the feature map a ₃″_-i, the feature map C _4-i is input into the fourth upper layer of the decoder, the feature map C ₄′_-i is obtained by outputting the feature map C3798, and the feature map C376857 is obtained by splicing the feature map C with the feature map C.

The third double-convolution module of the g-6) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _3-i is input into the third double-convolution module and output to obtain a feature map C _3-i.

G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer, and a second Sigmoid layer, the feature map a ₂′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer, and the first Sigmoid layer of the second dual-channel attention module, the feature map G _a ″ is obtained by outputting, the feature map a ₂′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer, and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m ″ is obtained by multiplying the feature map a ₂′_-i by element after adding the feature map G _a ″ and the feature map G _m ″, the feature map C3776 is input into the third upper layer of the decoder, the feature map C3737D is obtained by outputting the feature map C37 _2-i, and the feature map C ₂″_-i is obtained by splicing the feature map C.

G-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _2-i is input into the second double-convolution module and output to obtain a feature map C _2-i.

G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a ₁′_-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G _a '"is output, the feature map a ₁′_-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m'" is output, the feature map G _a '"and the feature map G _m'" are added, the feature map a ₁′_-i is multiplied by element to obtain the feature map a ₁″_-i, the feature map C _2-i is input into the second dual-channel attention module, the feature map C3798 'is output, and the feature map C _1-i' is spliced with the feature map C3723.

G-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D _1-i is input into the first double-convolution module and output to obtain the feature map C _1-i.

The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C _2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C _2-i ', inputting the characteristic diagram C _3-i into the second upsampling layer, outputting the characteristic diagram C _3-i', inputting the characteristic diagram C _4-i into the third upsampling layer, outputting the characteristic diagram C _4-i ', inputting the characteristic diagrams C _2-i', C _3-i ', and C _4-i', after the splicing operation, sequentially into the first convolution layer and the Sigmoid layer, the method comprises the steps of outputting a weight matrix G, performing Singular Value Decomposition (SVD) on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C _2-i by the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C _3-i by the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C _4-i by the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2 'and W3' and inputting the spliced feature map W1 'and W3' into a second convolution layer, and outputting to obtain a feature map Z.

In this embodiment, in step g-3), the convolution kernels of the first, second, third, and fourth dual-channel attention modules are 1×1 in size, 1 in step size, and 1 in padding; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.

Step h) comprises the steps of:

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A heart image segmentation method based on attention mechanism and multi-level feature fusion is characterized by comprising the following steps:

d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F _i, i is { 1., M };

j) Inputting the ith slice image F _i 'into the optimized segmentation network model, and outputting a segmentation result image P _i';

step f) comprises the steps of:

f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A' _4-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map/>Inputting the characteristic images into a bilinear interpolation layer to perform bilinear interpolation to obtain a characteristic image Q, sequentially inputting the characteristic images A' _4-i into a second convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic images A' _4-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic image into a third bilinear interpolation layer for bilinear interpolation to obtain a characteristic image V, multiplying the characteristic image Q by the characteristic image K, inputting the characteristic image into a softmax layer, outputting the characteristic image to obtain a characteristic image QK, multiplying the characteristic image QK by the characteristic image V to obtain a characteristic image Att, inputting the characteristic image Att into a fourth bilinear interpolation layer for bilinear interpolation, and inputting the characteristic image Att into a fourth convolution layer to obtain a characteristic image A _5-i;

Step g) comprises the steps of:

g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module;

g-2) inputting the feature map A _5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C _5-i;

G-3) the fourth dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' _4-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G _a is output, the feature map A ' _4-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the fourth dual-channel attention module, the feature map G _m is output, the feature map G _a and the feature map G _m are added and multiplied by the feature map A ' _4-i element by element to obtain a feature map A ' _4-i, the feature map C _5-i is input into the fifth upper layer of the decoder, the feature map A ' _4-i is input into the fourth dual-channel attention module, the feature map C ' _4-i is output, and the feature map C ' _4-i is spliced;

The fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D _4-i is input into the fourth double-convolution module, and the feature map C _4-i is output and obtained;

The third dual-channel attention module of the G-5) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' _3-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G _a ' is output, the feature map A ' _3-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G _m ' is output, the feature map G _a ' and the feature map G _m ' are added and multiplied by element by the feature map A ' _3-i to obtain a feature map A ' _3-i, the feature map C _4-i is input into the fourth dual-channel attention module, the feature map C ' 3723 is output, and the feature map C ' 3798 is spliced with the feature map C ' _3-i;

G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' _2-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G _a ″ is obtained by outputting, the feature map A ' _2-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m ″ is obtained by outputting, the feature map G _a ″ and the feature map G _m ″ are added and multiplied by element by the feature map A ' _2-i to obtain a feature map A "_2-i", the feature map C _3-i is input into the third upper layer of the decoder, the feature map A ' _3-i is obtained by outputting the feature map A ' _2-i and the feature map C ' 3798, and the feature map C is obtained by splicing the feature map C with the feature map C ' _2-i;

G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A '_1-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G _a' "is output, the feature map A '_1-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G _m'" is output, the feature map G _a '"and the feature map G _m'" are added and then multiplied with the element of the feature map A '_1-i to obtain a feature map A "_1-i, the feature map C is input into the second layer of the decoder, the feature map C' 3776 is input into the second dual-channel attention module, and the feature map C '3798 is output, and the feature map C is spliced with the feature map C' _1-i";

2. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: n cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).

3. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: step b) comprises the steps of:

4. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to the proportion of 7:1:2.

5. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in step d), M takes on a value of 1312.

6. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step f-2), the convolution kernel size of the first convolution layer of the first dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.

7. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.

8. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion as recited in claim 1 in which step h) comprises the steps of: