CN111932550B

CN111932550B - 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning

Info

Publication number: CN111932550B
Application number: CN202010622947.1A
Authority: CN
Inventors: 田梅; 董舜杰; 卓成; 张宏; 施政学; 赵金龙; 张茂俊
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2021-04-30
Anticipated expiration: 2040-07-01
Also published as: CN111932550A

Abstract

The invention discloses a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning, which comprises: an MRI data preprocessing module; depth attention network based on deformable convolution: the depth space-time deformable convolution fusion module TDAM inputs continuous 3D ventricle MRI video slice images on a time axis into a network so as to obtain a compensation area of a high-dimensional image in an MRI video band, and a high-dimensional image characteristic is obtained by utilizing a deformable convolution layer; and constructing a deformable convolution attention module to obtain an attention feature map, and suppressing an irrelevant background by using an addition attention module to finally obtain a network model. The newly input 3D ventricle MRI video is directly segmented by utilizing the trained network model, and the accuracy and efficiency of ventricle segmentation can be effectively improved by introducing a multi-frame image compensation, a deformable convolution and an attention adding mechanism, and the system has higher robustness.

Description

3D ventricle nuclear magnetic resonance video segmentation system based on deep learning

Technical Field

The invention relates to the technical field of medical image engineering, in particular to a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning.

Background

With the development of medical imaging technology and artificial intelligence, automatic and semi-automatic systems in computer-aided diagnosis are gradually replacing the traditional artificial diagnosis systems to perform accurate diagnosis and treatment. Magnetic Resonance Imaging (MRI) is currently widely used in ventricular diagnostics by virtue of its lack of radioactive damage and high resolution. In order to better understand the condition of the patient's ventricle, it is necessary to correctly segment the position of each part of the ventricle with an accurate segmentation system, however, the conventional clinical procedure through visual evaluation of three-dimensional MRI images is time-consuming and depends on the clinical experience of the doctor. Therefore, it is important to find a system that improves the accuracy and efficiency of diagnosis of portions of the heart chamber.

The challenges faced by the prior art are mainly: 1. magnetic resonance imaging is very sensitive to body movements of the patient and prone to artifacts, however, subtle changes are ignored by the detection system, resulting in reduced detection sensitivity. 2. Most of the existing algorithms are only suitable for processing two-dimensional natural images, and MRI images are three-dimensional structures formed by parallel scanning image frames, so that important interframe information can be ignored by the two-dimensional positioning algorithm. 3. The patient's heart chambers can be severely deformed with breathing changes, resulting in a very large deformation of areas of the same nature, especially the myocardial and right ventricle portions surrounding the left ventricle, which can be a significant distraction and challenge to the segmentation system. 4. Due to the fact that the quantity of medical image data is small, high-quality labeling data and training samples are lacked, the trained model may be over-fit or the generalization capability of the model is not high.

In summary, providing a 3D ventricular nuclear magnetic resonance video segmentation system based on deep learning, which utilizes continuity information between MRI video image frames and between frames to improve accuracy and efficiency of ventricular segmentation, becomes an important technical problem to be solved urgently at present.

Disclosure of Invention

The invention aims to provide a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning, aiming at the defects of the prior art of the current medical image ventricle segmentation, which is used for automatically segmenting the positions of all parts of the ventricle, and has high accuracy of positioning results and higher robustness of a model.

The purpose of the invention is realized by the following technical scheme: A3D ventricle magnetic resonance video segmentation system based on deep learning is characterized by comprising a 3D ventricle magnetic resonance MRI video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net (DeU-Net) and an image detection module;

the 3D ventricular magnetic resonance MRI video data preprocessing module comprises a data enhancement module and a data division module:

the data enhancement module: splitting the existing 3D ventricular MRI video data set into MRI images of each frame, expanding the data set and carrying out normalization processing on the image size;

the data dividing module: dividing the enhanced image data into a training set and a testing set; the training set and the test set both comprise complete 3D ventricular MRI images, and the training set is used for training a Deformable convolution depth attention network Deformable U-Net;

the Deformable convolution depth attention network Deformable U-Net comprises a depth space-time Deformable convolution fusion module TDAM and a depth Deformable convolution global attention module DGPA:

the depth space-time deformable convolution fusion module TDAM: the module comprises a U-Net network and a deformable convolution layer; the TDAM inputs each frame image of the continuous 3D ventricular MRI video on a time axis into a U-Net network, outputs the image as a high-dimensional characteristic compensation area offset field of the image in the MRI video segment, transmits the high-dimensional characteristic compensation area offset field and the input image into a deformable convolution layer, and calculates to obtain high-dimensional characteristic fused feature maps of the compensated image, namely the fused feature containing the information of the front frame and the rear frame;

the deep deformable convolution global attention module DGPA: the module comprises a U-Net network, a deformable convolution attention module, three summation attention modules and an output layer, wherein the deformable convolution attention module passes through a deformable convolution layer on the basis of a spatial attention module, and adds the output of the deformable convolution layer obtained through calculation with the output of the spatial attention module to finally obtain the output of the deformable convolution attention module; adding a deformable convolution attention module in a first layer skip connection of the U-Net network, and adding a sum attention module in the other three layers of skip connections; inputting the compensated high-dimensional features of the image into a U-Net network, transmitting the high-dimensional features calculated by the U-Net network into an output layer to obtain an attention feature map, and then obtaining a segmentation probability by adopting a softmax regression function, namely the probability that a certain region in the MRI image belongs to the left ventricle, the myocardium or the right ventricle;

the image detection module is used for segmenting a 3D ventricle area, the probability heat map of the 3D ventricle MRI image of the test set is calculated by using a trained network, the probability heat map corresponding to each ventricle MRI image is segmented according to the segmentation probability obtained by DGPA, and segmentation results, namely the left ventricle area, the myocardial area and the right ventricle area, are obtained.

Further, in the image processing process, the data enhancement module expands the data set through rotating, adjusting contrast and zooming, and divides the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a space coordinate system, t represents a time axis, and a video frame of an x-y plane is selected.

Furthermore, before each frame of MRI image is input into the Deformable U-Net network, r frames before and after the target frame along the time t axis direction are selected as the Deformable U-Net network input, namely 2r +1 frame of MRI images.

Furthermore, the depth space-time deformable convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and converts the input channel number (2r +1) × in _ c into nf, wherein in _ c is the input image channel number, and nf is the output channel number of the custom convolution layer; layers 2 to 4 are downsampling structures comprising two convolutional layers and two Relu functions; the 5 th to 6 th layers are up-sampling structures and comprise a convolution layer, an anti-convolution layer and two Relu functions; the 7 th layer is a skip transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, wherein the skip transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions; layer 8 is offset output structureIncluding two convolution layers and a Relu function, the second convolution layer outputting channel number (2r +1) × 2 × (defem _ ks)²Where deform _ ks is the deformable convolution kernel size; the ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image.

Further, in the depth space-time deformable convolution fusion module TDAM, the computation process of the convolution layer is as follows:

wherein conv_outOutputting the image size, conv, for the convolution layer_inIs the input image size, padding indicates that pixels are filled around the image, kennel _ size is the convolution kernel size, stride is the step size of the convolution kernel;

the 3 × 3 convolution kernel R is defined as: r { (-1, -1), (-1, 0), (0, 1), (1, 1) }, characteristic y (p) of the convolutional layer₀) Comprises the following steps:

wherein p is_nIs the position in R, w (-) is the weight, x (-) is the feature of the input image, p₀Is an initial position;

in the deformable convolution layer, the convolution kernel R is offset { Δ p }_n1, ·, N } enhanced, where N ═ R |; thus, the feature y' (p) of the deformable convolution₀) Comprises the following steps:

in the formula,. DELTA.p_nAnd compensating the region for the high-dimensional characteristics obtained by the U-Net network.

Furthermore, in the depth space-time Deformable convolution fusion module TDAM, each frame of 3D ventricle MRI image transmitted into Deformable U-Net networkLike as

The calculation formula of the fusion feature output by the TDAM is as follows:

where F (k) is the result characteristic, S is the convolution kernel size,

is the core of the h-th channel,

is an image of the h-th channel, h₀For the current channel, k is the arbitrary spatial position, k_sSampling an offset for a deformable convolution; providing additional learnable in TDAM

So that

k_s←k_s+δ_(h，k)，s

Wherein, delta_(h，k)Learned offset, delta, for the h-th channel of spatial position k_(h，k)，sThe sample offset for the learned offset.

Further, the DGPA can extract the relevant features of the global pixel points; the high-dimensional features I of the compensated MRI image are input to a 3 x 3 deformable convolution kernel in a deformable convolution attention module

Obtaining an output O:

inputting the compensated high-dimensional features I of the MRI image into three 1 x 1 convolution kernels in a spatial attention module to generateNovel feature map

N is the number of channels of the feature map, M is H × W, i.e. the number of pixels of the feature map, and H and W are the height and width of the feature map, respectively; performing matrix multiplication on C and B after the conversion, and obtaining a space attention diagram according to a softmax formula by using a result

Wherein, the calculation method of each element of S is as follows:

in the formula, s_jiIs an element of the ith row and the jth column in S, B_iFor the ith row of the feature map B,

for the transposed j-th column of the feature map C, the spatial attention map S is matrix-multiplied with the feature map D, and the result of the computation and the result O of the previous deformable convolution are added to obtain the final result

Wherein α is a weight coefficient, D_iIs line i of the feature map D, O_jIs the jth column of output O.

Further, in the deep deformable convolution global attention module DGPA, a sum attention module is used to suppress irrelevant backgrounds; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1_intxW x H, wherein F_intThe dimension parameters in the preset U-Net network are obtained; to the obtainedThe two matrixes are added point by point and then pass through a Relu activation layer, the dimension of the result is changed to 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then a weight coefficient alpha is obtained through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map.

Further, the Deformable U-Net adopts a cross entropy function as a Loss function of the network in the training process, and the calculation formula of the cross entropy Loss _ seg is as follows:

wherein M represents the number of categories, y_cIs a one-hot vector, p_cIs the probability that the network model predicts belongs to sample c;

updating a weight parameter theta of Deformable U-Net by adopting a standard Adam optimizer gradient descent, wherein the formula is as follows:

where η is the learning rate, θ^kIs the weight parameter for the k-th time.

The invention has the following beneficial effects:

1) depth features in 3D ventricular MRI image video data can be automatically learned. Conventional visual assessment requires observation and judgment of a doctor frame by frame, is extremely dependent on the experience and skill level of the doctor, and consumes a lot of time. DeU-Net is capable of automatically learning high-dimensional features in 3D ventricular MRI image video data to discover intrinsic associations between MRI images and portions of the ventricle. Compared with the traditional ventricular segmentation system, the system provided by the invention can learn high-order features which are difficult to recognize by human eyes.

2) Accurate segmentation of each part of the ventricle can be realized. The system provided by the invention can accurately segment the ventricle image of the patient, and compared with the existing segmentation algorithm based on the depth network, the left ventricle, the cardiac muscle and the right ventricle area segmented by the system are more consistent with the visual evaluation of a doctor, and higher accuracy and efficiency are kept. Therefore, the method has high value in helping a doctor to locate the ventricular area of the patient and the subsequent surgical treatment.

3) The device can be suitable for organ segmentation detection of different formats of different devices, such as CT images, ultrasonic images and X-ray images. The system proposed by the present invention is effective for each part of the ventricle as well as for the full time period.

4) Network training with small data volume can be realized. The invention increases the sample size by using an image enhancement mode, and trains the model and test data on the basis of the sample size, thereby avoiding overfitting of network training and improving the robustness of network training. In addition, in order to improve the segmentation of minute parts during ventricular contraction, the invention adopts a multi-frame quality enhancement mode to acquire space-time information between frames in the 3D ventricular MRI video to compensate the target image, and simultaneously uses a deformable convolution method to better fuse the compensated information into the target image, thereby enhancing the segmentation precision.

Drawings

FIG. 1 is a block diagram of a deep learning based 3D ventricular MRI video segmentation system according to an embodiment of the present invention;

FIG. 2 is a flow chart of an implementation of a deep learning based 3D ventricular nuclear magnetic resonance video segmentation system according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the construction of DeU-Net according to one embodiment of the present invention;

FIG. 4 is a schematic diagram of DeU-Net configuration according to one embodiment of the present invention;

FIG. 5 is a graph of DeU-Net ventricular segmentation results, in accordance with one embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in fig. 1 and fig. 2, the 3D ventricular MRI video segmentation system provided by the present invention includes a 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net (DeU-Net) and an image detection module;

the 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module comprises a data enhancement module and a data partitioning module:

the data enhancement module: the method comprises the steps of splitting an existing 3D ventricular MRI video data set into MRI images of frames, expanding the data set in a rotating, contrast adjusting and scaling mode, and normalizing the size of the images. The 3D ventricular MRI video is divided into four directions of x, y, z and t, wherein the x, y and z represent a space coordinate system, the t represents a time axis, and video frames of an x-y plane are selected.

The data dividing module: dividing the enhanced image data into a training set and a testing set; both the training set and the test set contain complete 3D ventricular MRI images. Before an image is input into a network, front and back r frames of a target frame along the time t axis direction are selected as Deformable U-Net network input, namely 2r +1 frame MRI images.

As shown in fig. 3 and 4, the Deformable convolved deep attention network Deformable U-Net (DeU-Net) includes a deep spatiotemporal Deformable convolution fusion module TDAM and a deep Deformable convolved global attention module DGPA:

the depth space-time deformable convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and the input channel number (2r +1) × in _ c is converted into nf, wherein in _ c is the input image channel number, and nf is the output channel number of the custom convolution layer. Layers 2 to 4 are downsampled structures comprising two convolutional layers and two Relu functions. Layers 5 to 6 are upsampled structures comprising a convolutional layer, an anti-convolutional layer and two Relu functions. The 7 th layer is a jump transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, and the jump transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions. The 8 th layer is an offset output structure and comprises two convolution layers and a Relu function, and the second convolution layer outputs the channel number (2r +1) × 2 × (defem _ ks)²Where deform _ ks is the deformable convolution kernel size. The ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image. In all convolution and deconvolution in upsampling and downsampling, the step size is 2, padding is 1, and the number of channels is the same. The rest(s)Convolution step size is 1 and padding is 0 to preserve feature size. The calculation process of the convolutional layer is as follows:

wherein p is_nIs the position in R, w (-) is the weight, x (-) is the feature of the input image, p₀Is the initial position.

3D ventricular frame-by-frame MRI images into Deformable U-Net network

The calculation formula of the fusion feature output by the TDAM is as follows:

wherein F (k) is a characteristic of the result,s is the size of the convolution kernel,

is the core of the h-th channel,

So that

k_s←k_s+δ_(h，k)，s

Wherein, delta_(h，k)Is the learnable offset field, δ, of the h channel at position k_(h，k)，sFor the learned sample offset of offset, the overall offset prediction network gets an offset field of:

wherein

Is a U-Net network.

The activation functions used in TDAM are linear rectifying units except for the last layer, which is linear activation. The linear rectifying unit g (z)' is calculated by:

the linear activation function g (z) is calculated as:

g(z)＝z

the input data in TDAM is bx (2r +1) × 3 × H × W, where B is the batch size, 2r +1 is the input MRI image frame number, 3 is the image channel number, H is the image height, and W is the image width. In this embodiment, the input MRI image size is 12 × 3 × 3 × 256 × 256, and the size is changed to 12 × 32 × 256 × 256 after passing through the first layer structure. After the third down-sampling of the 2 nd to 4 th layers, the obtained data sizes are 12 × 32 × 128 × 128, 12 × 32 × 64 × 64, and 12 × 32 × 32 × 32 in this order. After 5 th to 6 th layers are subjected to up-sampling twice, the obtained image feature sizes are 12 multiplied by 32 multiplied by 64 and 12 multiplied by 32 multiplied by 128 in sequence. The image feature size obtained by the layer 7 skip pass structure is 12 × 32 × 128 × 128, and is merged with the upsampled input before being transmitted to the upsampled structure. The 8 th layer is an offset output structure, and the obtained image feature size is 12 × 54 × 256 × 256. The image feature size of the fused feature maps obtained by the ninth layer of space-time deformable convolution structure is 12 × 64 × 256 × 256.

The depth deformable convolution global attention module DGPA: the system comprises a U-Net network, a deformable convolution Attention module, three Attention adding modules (Attention adding modules) and an output layer, wherein the deformable convolution Attention module passes the input of a space Attention module through a deformable convolution layer based on the space Attention module, and adds the output of the calculated deformable convolution layer with the output of the space Attention module to finally obtain the output of the deformable convolution Attention module; adding a deformable convolution Attention module in the first layer skip connection of the U-Net network, and adding an Attention module (Attention Gates module) in the other three layers of skip connection; and inputting the compensated high-dimensional feature fused feature maps of the images into a U-Net network, wherein the number of input channels is changed into 64, and the sizes of the images are not changed.

And 4 times of downsampling is carried out, each downsampling operation comprises convolution, a 3 multiplied by 3 convolution kernel is adopted, the number of channels is converted into one time, and the number of the channels of the image is changed into 128, 256, 512 and 1024 in sequence. After each convolution, nonlinear features are obtained through Relu activation functions. The maximum pooling operation with a pooling kernel of 2 × 2 changes the picture size to half of the original size, i.e., 128 × 128, 64 × 64, 32 × 32, and 16 × 16 in this order.

And then 4 times of upsampling are carried out, wherein each time of upsampling operation comprises convolution, a 3 multiplied by 3 convolution kernel is adopted, and the number of channels is converted into half of the original number, so that the number of channels of the image is changed into 1024, 512, 256 and 128 in sequence. After each convolution, nonlinear features are obtained through Relu activation functions. The picture size is changed to one time by linear interpolation, namely, the picture size is changed to 32 × 32, 64 × 64, 128 × 128 and 256 × 256 in sequence.

And meanwhile, the input also enters a DGPA module to extract the relevant characteristics of the global pixel points and is spliced with the output result of the DGPA network. And the result of the first three times of down sampling enters an addition attention network to suppress irrelevant backgrounds and then is spliced with the result of the first three times of up sampling.

In the DGPA of the depth deformable convolution global attention module, a global attention network of deformable convolution is adopted to extract global pixel point relevant characteristics, and the compensated high-dimensional characteristics I of the MRI image are input into a 3 multiplied by 3 deformable convolution kernel in the DGPA module

Obtaining an output O:

inputting the compensated high-dimensional features I of the MRI image into three 1 x 1 convolution kernels in a spatial attention module to generate a new feature map

Wherein, the calculation method of each element of S is as follows:

In the deep deformable convolution global attention module DGPA, a summation attention module is used for suppressing irrelevant background; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1_intxW x H, wherein F_intThe dimension parameters in the preset U-Net network are obtained; adding the two obtained matrixes point by point, passing through a Relu activation layer, changing the dimension of the result to be 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then obtaining a weight coefficient alpha through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map. The output result can be normalized to between 0 and 1, and the magnitude of the value represents the magnitude of the correlation between this point and the recognition result. The larger the description, the more likely the recognition object is contained therein. Therefore, irrelevant areas are suppressed, and the accuracy of image recognition is improved.

In model training, a cross entropy function is adopted as a Loss function of a network, and a calculation formula of the cross entropy Loss _ seg is as follows:

updating the weight parameter theta by adopting a standard Adam optimizer gradient descent, wherein the formula is as follows:

where η is the learning rate, θ^kIs the weight parameter for the k-th time.

In a specific case of applying the system of this embodiment, as shown in fig. 5, firstly, the acquired 3D ventricular MRI dataset is divided into a training set and a testing set, a deep space-time deformable convolution fusion module TDAM is constructed by using a U-Net network to obtain an offset of a target map in a 3D ventricular MRI video band, the obtained offset is fused into the target map by deformable convolution, and then the target map is input into a deep deformable convolution global attention module DGPA to extract global pixel point related features, and irrelevant portions are suppressed by using an attention adding network, so as to obtain a segmentation result map, thereby realizing accurate segmentation of the patient ventricle in the 3D ventricular MRI video image, and finally, the Dice of the whole video segmentation result is 90.1%, and compared with the existing segmentation algorithm based on a deep neural network, left ventricle, myocardium and right ventricle regions segmented by the system are more consistent with visual evaluation, the accuracy and efficiency are kept high.

The present invention is not limited to the above-described preferred embodiments. Any person can derive various other types of epileptogenic focus positioning systems based on deep learning according to the teaching of the present invention, and all equivalent changes and modifications made according to the application scope of the present invention shall fall within the scope of the present invention.

Claims

1. A3D ventricle magnetic resonance video segmentation system based on deep learning is characterized by comprising a 3D ventricle magnetic resonance MRI video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net and an image detection module;

2. The deep learning-based 3D ventricular MRI video segmentation system according to claim 1, wherein the data enhancement module expands the data set by rotation, contrast adjustment and scaling during image processing to divide the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a spatial coordinate system, t represents a time axis, and video frames of an x-y plane are selected.

3. The 3D ventricular MRI video segmentation system based on deep learning of claim 1, characterized in that before each frame of MRI image is inputted into the Deformable U-Net network, the previous and subsequent r frames of the target frame along the time t axis direction are selected as the Deformable U-Net network input, i.e. 2r +1 frame of MRI image.

4. The deep learning based 3D ventricular MRI video segmentation system of claim 1, wherein the depth is spatiotemporally variableThe shape convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and the input channel number (2r +1) × in _ c is converted into nf, wherein in _ c is the input image channel number, and nf is the user-defined convolution layer output channel number; layers 2 to 4 are downsampling structures comprising two convolutional layers and two Relu functions; the 5 th to 6 th layers are up-sampling structures and comprise a convolution layer, an anti-convolution layer and two Relu functions; the 7 th layer is a skip transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, wherein the skip transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions; the 8 th layer is an offset output structure and comprises two convolution layers and a Relu function, and the second convolution layer outputs the channel number (2r +1) × 2 × (defem _ ks)²Where deform _ ks is the deformable convolution kernel size; the ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image.

5. The deep learning-based 3D ventricular nuclear magnetic resonance video segmentation system according to claim 1, wherein in the depth spatiotemporal deformable convolution fusion module TDAM, the computation process of the convolution layer is as follows:

6. The deep learning-based 3D ventricular MRI video segmentation system according to claim 1, wherein in the deep spatiotemporal Deformable convolution fusion module TDAM, each frame of 3D ventricular MRI image transmitted into Deformable U-Net network is

The calculation formula of the fusion feature output by the TDAM is as follows:

where F (k) is the result characteristic, S is the convolution kernel size,

is the core of the h-th channel,

So that

k_s←k_s+δ_(h，k)，s

7. The deep learning based 3D ventricular MRI video segmentation system according to claim 1, wherein the DGPA can extract global pixel point related features; the high-dimensional features I of the compensated MRI image are input to a 3 x 3 deformable convolution kernel in a deformable convolution attention module

Obtaining an output O:

Wherein, the calculation method of each element of S is as follows:

8. The deep learning based 3D ventricular nuclear magnetic resonance video segmentation system of claim 7, characterized in that in the deep deformable convolution global attention module DGPA, a sum attention module is used to suppress irrelevant background; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1_intxW x H, wherein F_intThe dimension parameters in the preset U-Net network are obtained; adding the two obtained matrixes point by point, passing through a Relu activation layer, changing the dimension of the result to be 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then obtaining a weight coefficient alpha through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map.

9. The deep learning-based 3D ventricular nuclear magnetic resonance video segmentation system according to claim 1, wherein the Deformable U-Net adopts a cross entropy function as a Loss function of the network during the training process, and the cross entropy Loss _ seg is calculated by the following formula:

where η is the learning rate, θ^kIs the weight parameter for the k-th time.