CN114359310B

CN114359310B - 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning

Info

Publication number: CN114359310B
Application number: CN202210035567.7A
Authority: CN
Inventors: 董舜杰; 潘子宣; 卓成; 付钰
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2024-06-04
Anticipated expiration: 2042-01-13
Also published as: CN114359310A

Abstract

The invention discloses a 3D ventricular nuclear magnetic resonance video segmentation optimization system based on depth learning, which acquires high-dimensional image features in an MRI video segment through a depth space-time deformable convolution fusion module TDAM; the feature images with different scales are fused by utilizing space-time information in the high-dimensional image features obtained by the enhanced deformable convolution attention network TDAM, and then feature images with multi-scale information are output; obtaining the distribution of the high-dimensional image features through a probability noise correction module PNCM, and outputting an embedded vector containing distribution mean and variance information; and performing splicing convolution on the feature map output by EDAN and the embedded vector output by PNCM after expanding to obtain a prediction result. The training network model is utilized to directly segment the newly input 3D ventricular MRI video, and the accuracy and efficiency of ventricular segmentation can be effectively improved by introducing multi-frame image compensation, deformable convolution and a multi-scale attention mechanism, and the method has higher robustness.

Description

3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning

Technical Field

The invention relates to the technical field of medical image engineering, in particular to a 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning.

Background

With the development of medical imaging technology and artificial intelligence, automated and semi-automated systems in computer-aided diagnosis are gradually replacing traditional manual diagnostic systems for accurate diagnosis and treatment. Magnetic Resonance Imaging (MRI) is currently widely used in ventricular diagnostics by virtue of its lack of radioactive damage and high resolution. In order to have a better understanding of the ventricular condition of a patient, accurate segmentation systems are required to correctly segment the position of various parts of the ventricle, however, conventional clinical procedures by visual assessment of three-dimensional MRI images are time consuming and dependent on the clinical experience of the physician. Therefore, it is important to find a system that improves the accuracy and efficiency of diagnosis of various parts of the ventricle.

The challenges faced by the prior art are mainly: 1. complex motion and blood flow within the heart can cause the imaging data to contain significant amounts of motion artifacts, non-uniformities in intensity, and noise, however, the detection system can ignore subtle changes, resulting in reduced detection sensitivity. 2. Most of the existing algorithms are only suitable for processing two-dimensional natural images, while MRI images are three-dimensional structures composed of parallel scanned image frames, so that the two-dimensional positioning algorithm ignores important interframe information. 3. The shape of the heart may vary greatly when it is in different states. This deformation is particularly pronounced for patients suffering from heart disease. The heart chambers of a patient can undergo severe deformation with changes in breathing, resulting in regions of the same nature being greatly deformed, especially the myocardial and right ventricular portions surrounding the left ventricle, which can create significant interference and challenges to the segmentation system. 4. Because the medical image data volume is small, high-quality labeling data and training samples are lacking, the trained model is easy to generalize and weak in capability, and the problem of fitting is solved.

In summary, a 3D ventricular nmr video segmentation optimization system based on deep learning is provided, and the accuracy and efficiency of ventricular segmentation are improved by using continuity information in and between MRI video image frames, which becomes an important technical problem to be solved urgently.

Disclosure of Invention

Aiming at the defects of the prior art of ventricular segmentation of the current medical image, the invention provides a 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning, which is used for automatically segmenting the positions of all parts of a ventricle, and has high accuracy of a positioning result and higher robustness of a model.

The aim of the invention is realized by the following technical scheme: A3D ventricular MRI video segmentation optimization system based on deep learning comprises a 3D ventricular MRI video data preprocessing module, a deformable convolution depth attention network DeU-Net and an image detection module;

The 3D ventricular MRI video data preprocessing module performs normalization processing on each frame of MRI image in the existing 3D ventricular MRI video data set; input to a deformable convolved deep attention network DeU-Net;

The deformable convolved depth attention network DeU-Net includes a depth spatiotemporal deformable convolved fusion module TDAM, an enhanced deformable convolved attention network EDAN and a probabilistic noise correction module PNCM:

the depth spatiotemporal deformable convolution fusion module TDAM: the module comprises a U-Net network and a deformable convolution layer; the U-Net network takes the 3D ventricular MRI video data set as input, outputs the 3D ventricular MRI video data set as a high-dimensional feature compensation region of an image in an MRI video segment, and the deformable convolution layer takes the 3D ventricular MRI video data set and the high-dimensional feature compensation region as input to obtain the high-dimensional feature of the compensated image;

the enhanced deformable convolution attention network EDAN: the module comprises a downsampling channel and an upsampling channel. EDAN takes the high-dimensional feature of the TDAM output image as input, and the feature is obtained through L-layer downsampling Convolution is carried out to obtain the original characteristic/>, of the L layer in the up-sampling channelAnd/>After splicing, obtain/>And/>Commonly input to a DeConv (a) module, which obtains an offset delta _L and a fused L-th layer feature/>, through deformable convolutionThe specific calculation is as follows:

original features of the i-th layer of the upsampling channel Features of corresponding layers in the downsampling channel/>Splicing to obtain Δ_i+1，/>A common input DeConv (b) module, wherein i E [1, L-1]; finally, the fused characteristic/>, of the last layer of the up-sampling channel is obtained

The DeConv (b) module includes a multi-scale attention module MSAM and a deformable convolution layer; After being spliced with delta _i+1, the vector is convolved to obtain delta _i and MSAM, wherein the input is/> Features x (H W) and/>, after deformable convolution with Delta _i Convolved features/>Wherein H and W represent the height and width of the input feature, respectively; MSAM has a calculation formula:

where k and j are the indices of the input x, y and output z, k e1, H, Sigma (·) is a scalar function used for normalization, phi (·) is a function that calculates the pairing correlation between x and y; θ (·) is the feature transfer function;

Will be Features after bilinear interpolation are spliced with z and x (H×W) and then convolved to obtain/>

The probabilistic noise correction module PNCM: the module comprises a twin neural network with shared weights, which is formed by convolution layers, two independent feature extraction layers, which are formed by full connection layers, and a heavy parameter operation. The twin neural network takes the high-dimensional features of the image output by TDAM as input, the output of the twin neural network is respectively input into two feature extraction layers for feature extraction, one feature extraction layer is output as the mean value of the high-dimensional features of the input image, the other feature extraction layer is output as the variance of the high-dimensional features of the input image, and the variance is added with the mean value through heavy parameter operation to generate an embedded vector as final output;

the image detection module is used for 3D ventricular region segmentation, and a probability heat map of the 3D ventricular MRI image of the test set is calculated by using a trained deformable convolved depth attention network DeU-Net. Specifically, EDAN is used to obtain a feature map And splicing the embedded vector obtained by PNCM, and obtaining a probability heat map through a convolution layer. And dividing the probability heat map corresponding to each ventricular MRI image by the dividing probability to obtain a division result, namely a left ventricular region, a myocardial region and a right ventricular region.

Further, the data enhancement module expands the data set by rotating, adjusting contrast and scaling during the image processing process, and divides the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a spatial coordinate system, t represents a time axis, and a video frame of an x-y plane is selected.

Further, before each frame of MRI image is input into the forming U-Net network, the front and back r frames of the target frame along the direction of the time t axis are selected as the forming U-Net network input, namely 2r+1 frames of MRI images.

Further, in the depth space-time deformable convolution fusion module TDAM, the calculation process of the convolution layer is as follows:

Wherein conv _out is the output image size of the convolution layer, conv _in is the input image size, padding represents filling pixels around the image, kennel _size is the convolution kernel size, stride is the step size of the convolution kernel;

The 3 x3 convolution kernel R is defined as: r= { (-1, -1), (-1, 0), …, (0, 1), (1, 1) }, the characteristic y of the convolution layer (p ₀) is:

Wherein p _n is the position in R, w (·) is the weight, x (·) is the feature of the input image, and p ₀ is the initial position;

in the deformable convolution layer, the convolution kernel R is enhanced by offset { Δp _n |n=1, …, N } where n= |r|; thus, the feature y' (p ₀) of the deformable convolution is:

Where Δp _n is the high-dimensional feature compensation region obtained over the U-Net network.

Further, the enhanced deformable convolution attention network EDAN includes an up-sampling channel and a down-sampling channel, each of which includes an L-layer structure. Each layer in the downsampling channel comprises two 3 x 3 convolutional layers and a 2x 2 max-pooling layer of step size 2, each convolutional layer followed by a batch normalization operation and a Relu function. Each layer in the upsampling path uses a3 x 3 upsampling operation twice the upsampling feature followed by a batch normalization operation and a Relu function.

Further, in the multiscale attention module MSAM, the specific calculation process of the function of the pairing correlation between x and y and the feature transfer function is as follows:

Wherein f (x ^k)＝W_fx^k,g(y^j)＝W_gy^j;W_f,W_g is a convolutional layer;

θ (y ^j) generates a new representation of y ^j by a convolution layer:

θ(y^j)＝W_θy^j

Wherein W _θ is a convolutional layer.

Further, the probabilistic noise correction module PNCM includes a twin neural network with shared weights formed by convolution layers, two independent feature extraction layers formed by full connection layers, and a heavy parameter operation; the module regards TDAM fused feature maps as distributions and calculates the mean and variance by the following equation:

Where μ and Σ are the mean and variance, respectively, of the features, F _fused is the fused feature of TDAM output, g _φ (·) is the shared-weight twin neural network consisting of convolutional layers, And/>Feature extraction layers with weighting parameters θ _μ and θ _∑, respectively.

The random noise epsilon is sampled from the standard gaussian distribution N (0,I) by a heavy parameter operation, and the embedded vectors s-N (mu, epsilon sigma) are obtained through s=mu+epsilon sigma and serve as the output of the probability noise correction module.

Further, the DeU-Net adopts a total loss function in the training processAs a loss function of the network, the total loss function/>The calculation formula of (2) is as follows:

wherein y is a split tag, p _DeU-Net is a predicted result of DeU-Net, and α is a balance uncertainty loss And cross entropy function/>Is a super ginseng of (a) and (b).

Uncertainty lossThe calculation formula of (2) is as follows:

Wherein n is a batch parameter, and the calculation formula of q _i is:

Where diag (·) is the diagonal vector of the input tensor, m is the total feature dimension, Σ _i is the variance of the ith slice.

Cross entropyThe calculation formula of (2) is as follows:

Wherein M represents the class number, y _c is a one-hot vector, and p _c is DeU-Net predicted probability of belonging to sample c;

the weight parameter theta is updated by adopting the gradient descent of a standard Adam optimizer, and the formula is as follows:

where η is the learning rate and θ ^k is the weight parameter of the kth time.

The beneficial effects of the invention are as follows:

1) Depth features in 3D ventricular MRI image video data can be automatically learned. Traditional visual assessment requires a doctor to observe and judge frame by frame, is extremely dependent on the experience and skill level of the doctor, and consumes a lot of time. DeU-Net can automatically learn the high-dimensional features in 3D ventricular MRI image video data to discover the intrinsic links between MRI images and portions of the ventricles. Compared with the traditional ventricular segmentation system, the system provided by the invention can learn the high-order features which are difficult to identify by human eyes.

2) The accurate segmentation of the parts of the ventricle can be realized. Compared with the existing segmentation algorithm based on the depth network, the system provided by the invention can accurately segment the ventricular image of the patient, and the left ventricle, the cardiac muscle and the right ventricle areas segmented by the system are more consistent with the visual evaluation of doctors, and the higher accuracy and the higher efficiency are maintained. Therefore, it is of great value in helping the physician locate the ventricular area of the patient and in subsequent surgical treatment.

3) The method can be suitable for different equipment and different formats of organ segmentation detection, such as CT images, ultrasonic images and X-ray images. The system provided by the invention is effective for each part of the ventricle and the whole time period.

4) Network training with small data size can be realized. The invention increases the sample size by utilizing the image enhancement mode, and carries out training model and test data on the basis, thereby avoiding the over fitting of network training and improving the robustness of the network training. In addition, in order to improve segmentation of the tiny parts during ventricular contraction, the invention adopts a multi-frame quality enhancement mode to acquire the space-time information between frames in the 3D ventricular MRI video to compensate the target image, and simultaneously adopts a deformable convolution method to better fuse the compensated information into the target image so as to enhance the segmentation precision.

Drawings

FIG. 1 is a block diagram of a depth learning based 3D ventricular NMR video segmentation optimization system according to an embodiment of the invention;

FIG. 2 is a flow chart of an implementation of a depth learning based 3D ventricular NMR video segmentation optimization system according to one embodiment of the invention;

FIG. 3 is a TDAM schematic diagram of one embodiment of the present invention;

FIG. 4 is a EDAN schematic diagram of one embodiment of the invention;

FIG. 5 is a DeConv schematic diagram of the construction of one embodiment of the invention;

FIG. 6 is a MSAM schematic diagram of one embodiment of the invention;

FIG. 7 is a PNCM schematic diagram of one embodiment of the invention;

FIG. 8 is a graph of ventricular segmentation results for DeU-Net in accordance with one embodiment of the present invention.

Detailed Description

The invention will be described in further detail with reference to the drawings and the specific examples.

As shown in fig. 1 and fig. 2, the 3D ventricular MRI video segmentation optimization system provided by the present invention includes a 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module, a Deformable convolved depth attention network Deformable U-Net (DeU-Net) and an image detection module;

The 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module comprises a data enhancement module and a data division module:

The data enhancement module: the existing 3D ventricular MRI video data set is split into MRI images of each frame, the data set is expanded by means of rotation, contrast adjustment and scaling, and normalization processing is carried out on the image size. The 3D ventricular MRI video is divided into four directions of x, y, z and t, wherein x, y and z represent a space coordinate system, t represents a time axis, and video frames of an x-y plane are selected.

The data dividing module: dividing the enhanced image data into a training set and a testing set; the training set and the test set both contain complete 3D ventricular MRI images. Before the image is input into the network, the front and back r frames of the target frame along the direction of the time t axis are selected as input, namely 2r+1 frames of MRI images.

The Deformable convolved depth attention network de-formable U-Net (DeU-Net) includes a depth spatiotemporal Deformable convolution fusion module Temporal Deformable Aggregation Module (TDAM), an enhanced Deformable convolved attention network Enhanced Deformable Attention Network (EDAN) and a probabilistic noise correction module Probabilistic Noise Correction Module (PNCM):

As shown in fig. 3, the depth space-time deformable convolution fusion module TDAM includes a 9-layer structure, the first layer includes a convolution layer and a Relu function, and converts the number of input channels (2r+1) in_c into nf, where in_c is the number of input image channels, and nf is the number of output channels of the custom convolution layer. Layers 2 through 4 are downsampled structures comprising two convolutional layers and two Relu functions. Layers 5 through 6 are upsampled structures comprising a convolution layer, a deconvolution layer and two Relu functions. The 7 th layer is a jump transfer structure, processes the characteristics obtained by downsampling and fuses the processed characteristics with the upsampling result, and comprises two convolution layers, a deconvolution layer and three Relu functions. Layer 8 is an offset output structure comprising two convolutional layers and a Relu function, the second convolutional layer outputs the number of channels (2r+1) x 2x (deform _ks) ², where deform _ks is the deformable convolutional kernel size. The ninth layer structure is a deformable convolution and Relu functions, taking the input image and offset as the layer inputs, resulting in the high-dimensional features fused feature maps of the image. Step size is 2 in all convolutions and deconvolutions in up-sampling and down-sampling, padding is 1, and the number of channels is the same. The remaining convolution steps are 1 and padding is 0 for preserving feature size. The calculation process of the convolution layer is as follows:

Where p _n is the position in R, w (·) is the weight, x (·) is the feature of the input image, and p ₀ is the initial position.

In the deformable convolution layer, the convolution kernel R is enhanced by offset { Δp _n |n=1, …, N } where n= |r|. Thus, the feature y' (p ₀) of the deformable convolution is:

MRI images of each frame of 3D ventricle transmitted into Deformable U-Net network areThen TDAM outputs a fused feature calculation formula:

Where F (k) is the resulting feature, S is the convolution kernel size, For the core of the h channel,/>For the image of the h-th channel, h ₀ is the current channel, k is any spatial position, and k _s is the deformable convolution sampling offset. Additional learnable/>, is provided in TDAMSo that

k_s←k_s+δ_(h,k),s

Where k denotes any spatial position, k _s denotes a standard deformable convolution sampling offset,For the h-th channel at position k, the learned offset field, δ _(h,k),s is the sample offset of the learned offset, and the whole offset prediction network gets the offset field:

Wherein the method comprises the steps of Is a U-Net network.

The activation function used in TDAM is linear rectification unit except that the last layer is linear activation. The linear rectifying unit g (z)' is calculated by:

The linear activation function g (z) is calculated in the following way:

g(z)＝z

TDAM the input data is B× (2r+1) ×3×H×W, where B is the batch size, 2r+1 is the number of input MRI images, 3 is the number of image channels, H is the image height, and W is the image width. In this embodiment, the input MRI image size is 12×3×3×256×256, and the size is changed to 12×32×256×256 after the first layer structure. After three downsampling at layers 2 to 4, the resulting data sizes are 12 x 32 x 128 in turn, 12 x 32 x 64, 12X 32X 32 x 32. After the 5 th to 6 th layers of up-sampling, the obtained image feature sizes are 12×32×64×64, 12×32×128×128 in order. The image feature size obtained through the 7 th layer jump transfer structure is 12×32×128×128, and is combined with the up-sampled input before being transferred into the up-sampling structure. Layer 8 is an offset output structure, resulting in image feature sizes of 12×54×256×256. The ninth layer of spatiotemporal deformable convolution structure yields an image feature size of 12 x 64 x 256 for the fusion feature fused feature maps.

As shown in fig. 4, the enhanced deformable convolution attention network EDAN: the module contains one downsampling channel and one upsampling channel, each channel containing an L (l=4) layer structure. Each layer in the downsampling channel comprises two 3 x 3 convolutional layers and a 2 x 2 max-pooling layer of step size 2, each convolutional layer followed by a batch normalization operation and a Relu function. EDAN takes the high-dimensional feature of the TDAM output image as input, and the feature is obtained through L-layer downsamplingConvolution is carried out to obtain the original characteristic/>, of the L layer in the up-sampling channelAnd/>After splicing, obtain/>And/>Common input to DeConv (a) module results in offset Δ _L and fused layer L features/>Original features of the i-th layer of the upsampling channelAre all fused i+1st layer features/>Calculated by a 3 x 3 up-convolution double up-sampling operation, where each up-convolution operation follows a batch normalization operation and a Relu function, i e1, l-1. Original features of the i-th layer/>Features of corresponding layers in the downsampling channel/>Splicing to obtain/>Δ_i+1，/>And (3) jointly inputting the characteristics into a DeConv (b) module to finally obtain the fused characteristics/>, of the last layer of the up-sampling channel

As shown in fig. 5, the DeConv (a) module contains one deformable convolution layer and three convolution layers, to beThe offset Δ _L is calculated by two convolution layers and the offset Δ _L is compared with/>Inputting into a deformable convolution layer, and calculating to obtain the fused L-layer characteristic/>, through one convolution layerThe specific calculation is as follows:

As shown in fig. 5, the DeConv (b) module includes a multi-scale attention module MSAM, a deformable convolution layer, two quadratic linear interpolation operations, and four convolution layers. Will be Splicing the characteristic of a convolution layer and the offset of the (i+1) th layer offset delta _i+1 through a quadratic linear interpolation operation, and calculating the (i) th layer offset delta _i.Δ_i and/>, by using the convolution layerThe common input is calculated to obtain the characteristic x (H multiplied by W) by the deformable convolution layer.

As shown in FIG. 6, the multi-scale attention module MSAM sums x (H W)Convolved features/>The attention feature z is derived as an input to MSAM, where H and W represent the height and width of the input feature, respectively. MSAM has a calculation formula:

where k and j are the indices of the input x, y and output z, k e1, H, Sigma (·) is a scalar function used for normalization, phi (·) is a function that calculates the pairing correlation between x and y, and the specific calculation process is:

Wherein f (x ^k)＝W_fx^k,g(y^j)＝W_gy^j;W_f,W_g is a convolutional layer;

θ (y ^j) generates a new representation of y ^j by a convolution layer:

θ(y^j)＝W_θy^j

Wherein W _θ is a convolutional layer.

Will beThe features after bilinear interpolation are spliced with the attention features z and x (H×W) and then convolved to obtain/>The calculation formula of DeConv (b) is:

As shown in fig. 7, the probabilistic noise correction module PNCM: comprising a twin neural network of shared weights formed by convolution layers, two independent feature extraction layers formed by fully connected layers, and a heavy parameter operation. The module regards TDAM fused feature maps as distributions and calculates the mean and variance by the following equation:

where μ and Σ are the mean and variance of the features, respectively, F _fused is the fusion feature of the target frame output by TDAM, g _φ (-) is a dual-stream neural network of shared weights made up of convolutional layers, And/>Feature extraction layers with parameters θ _μ and θ _∑, respectively.

The re-parameterization is used to sample random noise epsilon from a standard gaussian distribution N (0,I), and the embedded vectors s-N (μ, epsilon sigma) are obtained as a module output by s=μ+epsilon sigma.

Using total loss function in model trainingOptimizing DeU-Net:

Uncertainty lossThe calculation formula of (2) is as follows:

Wherein n is a batch parameter, and the calculation formula of q _i is:

Cross entropyThe calculation formula of (2) is as follows:

Wherein M represents the number of categories, y _c is a one-hot vector, and p _c is the probability that the network model prediction belongs to sample c;

The image detection module is used for segmenting the 3D ventricular region, and calculating a probability heat map of the 3D ventricular MRI image of the test set by using the trained network. Specifically, after the feature map obtained by EDAN is unfolded and spliced with the embedded vector obtained by PNCM, a probability heat map is obtained through a convolution layer. And dividing the probability heat map corresponding to each ventricular MRI image by the dividing probability to obtain a division result, namely a left ventricular region, a myocardial region and a right ventricular region.

In a specific case of applying the system of the embodiment, as shown in fig. 2, the collected 3D ventricular MRI dataset is first divided into a training set and a test set, a U-Net network is used to construct a depth space-time deformable convolution fusion module TDAM, an offset of a target image in a 3D ventricular MRI video segment is obtained, the obtained offset is fused into the target image through deformable convolution, then the target image is input into a deformable convolution attention network EDAM and a probabilistic noise correction module PNCM, an embedded vector which is fused with space-time information and represents uncertainty is obtained respectively, the feature image and the expanded embedded vector are spliced, and finally a segmentation result image is obtained, as shown in fig. 8, so that accurate segmentation of a patient ventricle in the 3D ventricular MRI video image is realized, and finally the Dice of the whole video segmentation result is 92.9%, and compared with the existing segmentation algorithm based on the depth neural network, the left ventricle, the myocardial and right ventricle region and the doctor vision assessment of the system are more consistent, and higher accuracy and efficiency are maintained.

The present invention is not limited to the above-described preferred embodiments. Any person who can obtain other various types of deep learning-based epilepsy-induction-range positioning systems under the teaching of the present invention shall fall within the scope of the present invention.

Claims

1. The 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning is characterized by comprising a 3D ventricular nuclear magnetic resonance MRI video data preprocessing module, a deformable convolution depth attention network DeU-Net and an image detection module;

The enhanced deformable convolution attention network EDAN: the module comprises a downsampling channel and an upsampling channel; EDAN takes the high-dimensional feature of the TDAM output image as input, and the feature is obtained through L-layer downsampling Convolution is carried out to obtain the original characteristic/>, of the L layer in the up-sampling channelAnd/>After splicing, obtain/>And/>Commonly input to a DeConv (a) module, which obtains an offset delta _L and a fused L-th layer feature/>, through deformable convolutionThe specific calculation is as follows:

original features of the i-th layer of the upsampling channel Features of corresponding layers in the downsampling channel/>Splicing to obtain/> Δ_i+1，/>A common input DeConv (b) module, wherein i E [1, L-1]; finally, the fused characteristic/>, of the last layer of the up-sampling channel is obtained

The DeConv (b) module includes a multi-scale attention module MSAM and a deformable convolution layer; After being spliced with delta _i+1, the vector is convolved to obtain delta _i and MSAM, wherein the input is/> Feature x (H W) and after deformable convolution with Delta _i Convolved features/>Wherein H and W represent the height and width of the input feature, respectively; MSAM has a calculation formula:

The probabilistic noise correction module PNCM: the module comprises a twin neural network which is composed of convolution layers and shares weight, two independent feature extraction layers which are composed of full connection layers and heavy parameter operation; the twin neural network takes the high-dimensional features of the image output by TDAM as input, the output of the twin neural network is respectively input into two feature extraction layers for feature extraction, one feature extraction layer is output as the mean value of the high-dimensional features of the input image, the other feature extraction layer is output as the variance of the high-dimensional features of the input image, and the variance is added with the mean value through heavy parameter operation to generate an embedded vector as final output;

the image detection module is used for 3D ventricular region segmentation, and a probability heat map of the 3D ventricular MRI image of the test set is calculated by using a trained deformable convolved depth attention network DeU-Net; specifically, EDAN is used to obtain a feature map Splicing the embedded vector obtained by PNCM, and obtaining a probability heat map through a convolution layer; and dividing the probability heat map corresponding to each ventricular MRI image by the dividing probability to obtain a division result, namely a left ventricular region, a myocardial region and a right ventricular region.

2. The depth learning-based 3D ventricular MRI video segmentation optimization system of claim 1, wherein the 3D ventricular MRI video data preprocessing module comprises a data enhancement module, wherein the data enhancement module expands the data set by rotating, adjusting contrast and scaling during image processing, so as to divide the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a spatial coordinate system, t represents a time axis, and a video frame of an x-y plane is selected.

3. The 3D ventricular nmr video segmentation optimization system based on deep learning as claimed in claim 1, wherein before each frame of MRI image is input into a formable U-Net network, r frames before and after the target frame along the direction of the time t axis are selected as the input of the formable U-Net network, namely 2r+1 frames of MRI image.

4. The 3D ventricular nmr video segmentation optimization system based on deep learning of claim 1, wherein in the depth spatiotemporal deformable convolution fusion module TDAM, the calculation process of the convolution layer is as follows:

The 3 x 3 convolution kernel R is defined as: r= { (-1, -1), (-1, 0), (0, 1), (1, 1) }, the characteristic y (p ₀) of the convolutional layer is:

in the deformable convolution layer, the convolution kernel R is enhanced by offset { Δp _n |n=1,..n } where n= |r|; thus, the feature y' (p ₀) of the deformable convolution is:

5. The depth learning based 3D ventricular nmr video segmentation optimization system of claim 1, wherein the enhanced deformable convolution attention network EDAN comprises an up-sampling channel and a down-sampling channel, each channel comprising an L-layer structure; each layer in the downsampling channel comprises two 3 x 3 convolutional layers and a2 x 2 max pooling layer with a step size of 2, each convolutional layer being followed by a batch normalization operation and a Relu function; each layer in the upsampling path uses a 3 x 3 upsampling operation twice the upsampling feature followed by a batch normalization operation and a Relu function.

6. The 3D ventricular nmr video segmentation optimization system according to claim 1, wherein in the multi-scale attention module MSAM, the specific calculation process of the function of the pairing correlation between x and y and the feature transfer function is as follows:

Wherein f (x ^k)＝W_fx^k,g(y^j)＝W_gy^j;W_f,W_g is a convolutional layer;

θ (y ^j) generates a new representation of y ^j by a convolution layer:

θ(y^j)＝W_θy^j

Wherein W _θ is a convolutional layer.

7. The 3D ventricular nmr video segmentation optimization system based on deep learning of claim 1, wherein the probabilistic noise correction module PNCM comprises a weight-sharing twin neural network formed by convolution layers, two independent feature extraction layers formed by full connection layers, and a heavy parameter operation; the module regards TDAM fused feature maps as distributions and calculates the mean and variance by the following equation:

Where μ and Σ are the mean and variance, respectively, of the features, F _fused is the fused feature of TDAM output, g _φ (·) is the shared-weight twin neural network consisting of convolutional layers, And/>Feature extraction layers with weight parameters theta _μ and theta _∑ respectively;

8. The 3D ventricular nmr video segmentation optimization system based on deep learning of claim 1, wherein said DeU-Net uses a total loss function during trainingAs a loss function of the network, the total loss function/>The calculation formula of (2) is as follows:

wherein y is a split tag, p _DeU-Net is a predicted result of DeU-Net, and α is a balance uncertainty loss And cross entropy function/>Is prepared from radix Ginseng Rubra;

Uncertainty loss The calculation formula of (2) is as follows:

Wherein n is a batch parameter, and the calculation formula of q _i is:

Wherein diag (·) is the diagonal vector of the input tensor, m is the total feature dimension, Σ _i is the variance of the ith slice; cross entropy The calculation formula of (2) is as follows:

Wherein M represents the class number, y _c is a one-hot vector, and p _c is DeU-Net predicted probability of belonging to sample c; the weight parameter theta is updated by adopting the gradient descent of a standard Adam optimizer, and the formula is as follows: