CN111932550B - 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning - Google Patents

3D ventricle nuclear magnetic resonance video segmentation system based on deep learning Download PDF

Info

Publication number
CN111932550B
CN111932550B CN202010622947.1A CN202010622947A CN111932550B CN 111932550 B CN111932550 B CN 111932550B CN 202010622947 A CN202010622947 A CN 202010622947A CN 111932550 B CN111932550 B CN 111932550B
Authority
CN
China
Prior art keywords
convolution
deformable
image
module
mri
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010622947.1A
Other languages
Chinese (zh)
Other versions
CN111932550A (en
Inventor
田梅
董舜杰
卓成
张宏
施政学
赵金龙
张茂俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010622947.1A priority Critical patent/CN111932550B/en
Publication of CN111932550A publication Critical patent/CN111932550A/en
Application granted granted Critical
Publication of CN111932550B publication Critical patent/CN111932550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Abstract

The invention discloses a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning, which comprises: an MRI data preprocessing module; depth attention network based on deformable convolution: the depth space-time deformable convolution fusion module TDAM inputs continuous 3D ventricle MRI video slice images on a time axis into a network so as to obtain a compensation area of a high-dimensional image in an MRI video band, and a high-dimensional image characteristic is obtained by utilizing a deformable convolution layer; and constructing a deformable convolution attention module to obtain an attention feature map, and suppressing an irrelevant background by using an addition attention module to finally obtain a network model. The newly input 3D ventricle MRI video is directly segmented by utilizing the trained network model, and the accuracy and efficiency of ventricle segmentation can be effectively improved by introducing a multi-frame image compensation, a deformable convolution and an attention adding mechanism, and the system has higher robustness.

Description

3D ventricle nuclear magnetic resonance video segmentation system based on deep learning
Technical Field
The invention relates to the technical field of medical image engineering, in particular to a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning.
Background
With the development of medical imaging technology and artificial intelligence, automatic and semi-automatic systems in computer-aided diagnosis are gradually replacing the traditional artificial diagnosis systems to perform accurate diagnosis and treatment. Magnetic Resonance Imaging (MRI) is currently widely used in ventricular diagnostics by virtue of its lack of radioactive damage and high resolution. In order to better understand the condition of the patient's ventricle, it is necessary to correctly segment the position of each part of the ventricle with an accurate segmentation system, however, the conventional clinical procedure through visual evaluation of three-dimensional MRI images is time-consuming and depends on the clinical experience of the doctor. Therefore, it is important to find a system that improves the accuracy and efficiency of diagnosis of portions of the heart chamber.
The challenges faced by the prior art are mainly: 1. magnetic resonance imaging is very sensitive to body movements of the patient and prone to artifacts, however, subtle changes are ignored by the detection system, resulting in reduced detection sensitivity. 2. Most of the existing algorithms are only suitable for processing two-dimensional natural images, and MRI images are three-dimensional structures formed by parallel scanning image frames, so that important interframe information can be ignored by the two-dimensional positioning algorithm. 3. The patient's heart chambers can be severely deformed with breathing changes, resulting in a very large deformation of areas of the same nature, especially the myocardial and right ventricle portions surrounding the left ventricle, which can be a significant distraction and challenge to the segmentation system. 4. Due to the fact that the quantity of medical image data is small, high-quality labeling data and training samples are lacked, the trained model may be over-fit or the generalization capability of the model is not high.
In summary, providing a 3D ventricular nuclear magnetic resonance video segmentation system based on deep learning, which utilizes continuity information between MRI video image frames and between frames to improve accuracy and efficiency of ventricular segmentation, becomes an important technical problem to be solved urgently at present.
Disclosure of Invention
The invention aims to provide a 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning, aiming at the defects of the prior art of the current medical image ventricle segmentation, which is used for automatically segmenting the positions of all parts of the ventricle, and has high accuracy of positioning results and higher robustness of a model.
The purpose of the invention is realized by the following technical scheme: A3D ventricle magnetic resonance video segmentation system based on deep learning is characterized by comprising a 3D ventricle magnetic resonance MRI video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net (DeU-Net) and an image detection module;
the 3D ventricular magnetic resonance MRI video data preprocessing module comprises a data enhancement module and a data division module:
the data enhancement module: splitting the existing 3D ventricular MRI video data set into MRI images of each frame, expanding the data set and carrying out normalization processing on the image size;
the data dividing module: dividing the enhanced image data into a training set and a testing set; the training set and the test set both comprise complete 3D ventricular MRI images, and the training set is used for training a Deformable convolution depth attention network Deformable U-Net;
the Deformable convolution depth attention network Deformable U-Net comprises a depth space-time Deformable convolution fusion module TDAM and a depth Deformable convolution global attention module DGPA:
the depth space-time deformable convolution fusion module TDAM: the module comprises a U-Net network and a deformable convolution layer; the TDAM inputs each frame image of the continuous 3D ventricular MRI video on a time axis into a U-Net network, outputs the image as a high-dimensional characteristic compensation area offset field of the image in the MRI video segment, transmits the high-dimensional characteristic compensation area offset field and the input image into a deformable convolution layer, and calculates to obtain high-dimensional characteristic fused feature maps of the compensated image, namely the fused feature containing the information of the front frame and the rear frame;
the deep deformable convolution global attention module DGPA: the module comprises a U-Net network, a deformable convolution attention module, three summation attention modules and an output layer, wherein the deformable convolution attention module passes through a deformable convolution layer on the basis of a spatial attention module, and adds the output of the deformable convolution layer obtained through calculation with the output of the spatial attention module to finally obtain the output of the deformable convolution attention module; adding a deformable convolution attention module in a first layer skip connection of the U-Net network, and adding a sum attention module in the other three layers of skip connections; inputting the compensated high-dimensional features of the image into a U-Net network, transmitting the high-dimensional features calculated by the U-Net network into an output layer to obtain an attention feature map, and then obtaining a segmentation probability by adopting a softmax regression function, namely the probability that a certain region in the MRI image belongs to the left ventricle, the myocardium or the right ventricle;
the image detection module is used for segmenting a 3D ventricle area, the probability heat map of the 3D ventricle MRI image of the test set is calculated by using a trained network, the probability heat map corresponding to each ventricle MRI image is segmented according to the segmentation probability obtained by DGPA, and segmentation results, namely the left ventricle area, the myocardial area and the right ventricle area, are obtained.
Further, in the image processing process, the data enhancement module expands the data set through rotating, adjusting contrast and zooming, and divides the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a space coordinate system, t represents a time axis, and a video frame of an x-y plane is selected.
Furthermore, before each frame of MRI image is input into the Deformable U-Net network, r frames before and after the target frame along the time t axis direction are selected as the Deformable U-Net network input, namely 2r +1 frame of MRI images.
Furthermore, the depth space-time deformable convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and converts the input channel number (2r +1) × in _ c into nf, wherein in _ c is the input image channel number, and nf is the output channel number of the custom convolution layer; layers 2 to 4 are downsampling structures comprising two convolutional layers and two Relu functions; the 5 th to 6 th layers are up-sampling structures and comprise a convolution layer, an anti-convolution layer and two Relu functions; the 7 th layer is a skip transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, wherein the skip transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions; layer 8 is offset output structureIncluding two convolution layers and a Relu function, the second convolution layer outputting channel number (2r +1) × 2 × (defem _ ks)2Where deform _ ks is the deformable convolution kernel size; the ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image.
Further, in the depth space-time deformable convolution fusion module TDAM, the computation process of the convolution layer is as follows:
Figure GDA0002955284490000031
wherein convoutOutputting the image size, conv, for the convolution layerinIs the input image size, padding indicates that pixels are filled around the image, kennel _ size is the convolution kernel size, stride is the step size of the convolution kernel;
the 3 × 3 convolution kernel R is defined as: r { (-1, -1), (-1, 0), (0, 1), (1, 1) }, characteristic y (p) of the convolutional layer0) Comprises the following steps:
Figure GDA0002955284490000032
wherein p isnIs the position in R, w (-) is the weight, x (-) is the feature of the input image, p0Is an initial position;
in the deformable convolution layer, the convolution kernel R is offset { Δ p }n1, ·, N } enhanced, where N ═ R |; thus, the feature y' (p) of the deformable convolution0) Comprises the following steps:
Figure GDA0002955284490000033
in the formula,. DELTA.pnAnd compensating the region for the high-dimensional characteristics obtained by the U-Net network.
Furthermore, in the depth space-time Deformable convolution fusion module TDAM, each frame of 3D ventricle MRI image transmitted into Deformable U-Net networkLike as
Figure GDA0002955284490000034
The calculation formula of the fusion feature output by the TDAM is as follows:
Figure GDA0002955284490000035
where F (k) is the result characteristic, S is the convolution kernel size,
Figure GDA0002955284490000036
is the core of the h-th channel,
Figure GDA0002955284490000037
is an image of the h-th channel, h0For the current channel, k is the arbitrary spatial position, ksSampling an offset for a deformable convolution; providing additional learnable in TDAM
Figure GDA0002955284490000038
So that
ks←ks(h,k),s
Wherein, delta(h,k)Learned offset, delta, for the h-th channel of spatial position k(h,k),sThe sample offset for the learned offset.
Further, the DGPA can extract the relevant features of the global pixel points; the high-dimensional features I of the compensated MRI image are input to a 3 x 3 deformable convolution kernel in a deformable convolution attention module
Figure GDA00029552844900000411
Obtaining an output O:
Figure GDA0002955284490000041
inputting the compensated high-dimensional features I of the MRI image into three 1 x 1 convolution kernels in a spatial attention module to generateNovel feature map
Figure GDA0002955284490000042
N is the number of channels of the feature map, M is H × W, i.e. the number of pixels of the feature map, and H and W are the height and width of the feature map, respectively; performing matrix multiplication on C and B after the conversion, and obtaining a space attention diagram according to a softmax formula by using a result
Figure GDA0002955284490000043
Wherein, the calculation method of each element of S is as follows:
Figure GDA0002955284490000044
in the formula, sjiIs an element of the ith row and the jth column in S, BiFor the ith row of the feature map B,
Figure GDA00029552844900000410
for the transposed j-th column of the feature map C, the spatial attention map S is matrix-multiplied with the feature map D, and the result of the computation and the result O of the previous deformable convolution are added to obtain the final result
Figure GDA0002955284490000045
Figure GDA0002955284490000046
Wherein α is a weight coefficient, DiIs line i of the feature map D, OjIs the jth column of output O.
Further, in the deep deformable convolution global attention module DGPA, a sum attention module is used to suppress irrelevant backgrounds; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1intxW x H, wherein FintThe dimension parameters in the preset U-Net network are obtained; to the obtainedThe two matrixes are added point by point and then pass through a Relu activation layer, the dimension of the result is changed to 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then a weight coefficient alpha is obtained through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map.
Further, the Deformable U-Net adopts a cross entropy function as a Loss function of the network in the training process, and the calculation formula of the cross entropy Loss _ seg is as follows:
Figure GDA0002955284490000047
wherein M represents the number of categories, ycIs a one-hot vector, pcIs the probability that the network model predicts belongs to sample c;
updating a weight parameter theta of Deformable U-Net by adopting a standard Adam optimizer gradient descent, wherein the formula is as follows:
Figure GDA0002955284490000048
where η is the learning rate, θkIs the weight parameter for the k-th time.
The invention has the following beneficial effects:
1) depth features in 3D ventricular MRI image video data can be automatically learned. Conventional visual assessment requires observation and judgment of a doctor frame by frame, is extremely dependent on the experience and skill level of the doctor, and consumes a lot of time. DeU-Net is capable of automatically learning high-dimensional features in 3D ventricular MRI image video data to discover intrinsic associations between MRI images and portions of the ventricle. Compared with the traditional ventricular segmentation system, the system provided by the invention can learn high-order features which are difficult to recognize by human eyes.
2) Accurate segmentation of each part of the ventricle can be realized. The system provided by the invention can accurately segment the ventricle image of the patient, and compared with the existing segmentation algorithm based on the depth network, the left ventricle, the cardiac muscle and the right ventricle area segmented by the system are more consistent with the visual evaluation of a doctor, and higher accuracy and efficiency are kept. Therefore, the method has high value in helping a doctor to locate the ventricular area of the patient and the subsequent surgical treatment.
3) The device can be suitable for organ segmentation detection of different formats of different devices, such as CT images, ultrasonic images and X-ray images. The system proposed by the present invention is effective for each part of the ventricle as well as for the full time period.
4) Network training with small data volume can be realized. The invention increases the sample size by using an image enhancement mode, and trains the model and test data on the basis of the sample size, thereby avoiding overfitting of network training and improving the robustness of network training. In addition, in order to improve the segmentation of minute parts during ventricular contraction, the invention adopts a multi-frame quality enhancement mode to acquire space-time information between frames in the 3D ventricular MRI video to compensate the target image, and simultaneously uses a deformable convolution method to better fuse the compensated information into the target image, thereby enhancing the segmentation precision.
Drawings
FIG. 1 is a block diagram of a deep learning based 3D ventricular MRI video segmentation system according to an embodiment of the present invention;
FIG. 2 is a flow chart of an implementation of a deep learning based 3D ventricular nuclear magnetic resonance video segmentation system according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the construction of DeU-Net according to one embodiment of the present invention;
FIG. 4 is a schematic diagram of DeU-Net configuration according to one embodiment of the present invention;
FIG. 5 is a graph of DeU-Net ventricular segmentation results, in accordance with one embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
As shown in fig. 1 and fig. 2, the 3D ventricular MRI video segmentation system provided by the present invention includes a 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net (DeU-Net) and an image detection module;
the 3D ventricular nuclear Magnetic Resonance (MRI) video data preprocessing module comprises a data enhancement module and a data partitioning module:
the data enhancement module: the method comprises the steps of splitting an existing 3D ventricular MRI video data set into MRI images of frames, expanding the data set in a rotating, contrast adjusting and scaling mode, and normalizing the size of the images. The 3D ventricular MRI video is divided into four directions of x, y, z and t, wherein the x, y and z represent a space coordinate system, the t represents a time axis, and video frames of an x-y plane are selected.
The data dividing module: dividing the enhanced image data into a training set and a testing set; both the training set and the test set contain complete 3D ventricular MRI images. Before an image is input into a network, front and back r frames of a target frame along the time t axis direction are selected as Deformable U-Net network input, namely 2r +1 frame MRI images.
As shown in fig. 3 and 4, the Deformable convolved deep attention network Deformable U-Net (DeU-Net) includes a deep spatiotemporal Deformable convolution fusion module TDAM and a deep Deformable convolved global attention module DGPA:
the depth space-time deformable convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and the input channel number (2r +1) × in _ c is converted into nf, wherein in _ c is the input image channel number, and nf is the output channel number of the custom convolution layer. Layers 2 to 4 are downsampled structures comprising two convolutional layers and two Relu functions. Layers 5 to 6 are upsampled structures comprising a convolutional layer, an anti-convolutional layer and two Relu functions. The 7 th layer is a jump transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, and the jump transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions. The 8 th layer is an offset output structure and comprises two convolution layers and a Relu function, and the second convolution layer outputs the channel number (2r +1) × 2 × (defem _ ks)2Where deform _ ks is the deformable convolution kernel size. The ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image. In all convolution and deconvolution in upsampling and downsampling, the step size is 2, padding is 1, and the number of channels is the same. The rest(s)Convolution step size is 1 and padding is 0 to preserve feature size. The calculation process of the convolutional layer is as follows:
Figure GDA0002955284490000061
wherein convoutOutputting the image size, conv, for the convolution layerinIs the input image size, padding indicates that pixels are filled around the image, kennel _ size is the convolution kernel size, stride is the step size of the convolution kernel;
the 3 × 3 convolution kernel R is defined as: r { (-1, -1), (-1, 0), (0, 1), (1, 1) }, characteristic y (p) of the convolutional layer0) Comprises the following steps:
Figure GDA0002955284490000062
wherein p isnIs the position in R, w (-) is the weight, x (-) is the feature of the input image, p0Is the initial position.
In the deformable convolution layer, the convolution kernel R is offset { Δ p }n1, ·, N } enhanced, where N ═ R |; thus, the feature y' (p) of the deformable convolution0) Comprises the following steps:
Figure GDA0002955284490000063
in the formula,. DELTA.pnAnd compensating the region for the high-dimensional characteristics obtained by the U-Net network.
3D ventricular frame-by-frame MRI images into Deformable U-Net network
Figure GDA0002955284490000064
The calculation formula of the fusion feature output by the TDAM is as follows:
Figure GDA0002955284490000071
wherein F (k) is a characteristic of the result,s is the size of the convolution kernel,
Figure GDA0002955284490000076
is the core of the h-th channel,
Figure GDA0002955284490000077
is an image of the h-th channel, h0For the current channel, k is the arbitrary spatial position, ksSampling an offset for a deformable convolution; providing additional learnable in TDAM
Figure GDA0002955284490000078
So that
ks←ks(h,k),s
Wherein, delta(h,k)Is the learnable offset field, δ, of the h channel at position k(h,k),sFor the learned sample offset of offset, the overall offset prediction network gets an offset field of:
Figure GDA0002955284490000072
wherein
Figure GDA0002955284490000073
Figure GDA0002955284490000074
Is a U-Net network.
The activation functions used in TDAM are linear rectifying units except for the last layer, which is linear activation. The linear rectifying unit g (z)' is calculated by:
Figure GDA0002955284490000075
the linear activation function g (z) is calculated as:
g(z)=z
the input data in TDAM is bx (2r +1) × 3 × H × W, where B is the batch size, 2r +1 is the input MRI image frame number, 3 is the image channel number, H is the image height, and W is the image width. In this embodiment, the input MRI image size is 12 × 3 × 3 × 256 × 256, and the size is changed to 12 × 32 × 256 × 256 after passing through the first layer structure. After the third down-sampling of the 2 nd to 4 th layers, the obtained data sizes are 12 × 32 × 128 × 128, 12 × 32 × 64 × 64, and 12 × 32 × 32 × 32 in this order. After 5 th to 6 th layers are subjected to up-sampling twice, the obtained image feature sizes are 12 multiplied by 32 multiplied by 64 and 12 multiplied by 32 multiplied by 128 in sequence. The image feature size obtained by the layer 7 skip pass structure is 12 × 32 × 128 × 128, and is merged with the upsampled input before being transmitted to the upsampled structure. The 8 th layer is an offset output structure, and the obtained image feature size is 12 × 54 × 256 × 256. The image feature size of the fused feature maps obtained by the ninth layer of space-time deformable convolution structure is 12 × 64 × 256 × 256.
The depth deformable convolution global attention module DGPA: the system comprises a U-Net network, a deformable convolution Attention module, three Attention adding modules (Attention adding modules) and an output layer, wherein the deformable convolution Attention module passes the input of a space Attention module through a deformable convolution layer based on the space Attention module, and adds the output of the calculated deformable convolution layer with the output of the space Attention module to finally obtain the output of the deformable convolution Attention module; adding a deformable convolution Attention module in the first layer skip connection of the U-Net network, and adding an Attention module (Attention Gates module) in the other three layers of skip connection; and inputting the compensated high-dimensional feature fused feature maps of the images into a U-Net network, wherein the number of input channels is changed into 64, and the sizes of the images are not changed.
And 4 times of downsampling is carried out, each downsampling operation comprises convolution, a 3 multiplied by 3 convolution kernel is adopted, the number of channels is converted into one time, and the number of the channels of the image is changed into 128, 256, 512 and 1024 in sequence. After each convolution, nonlinear features are obtained through Relu activation functions. The maximum pooling operation with a pooling kernel of 2 × 2 changes the picture size to half of the original size, i.e., 128 × 128, 64 × 64, 32 × 32, and 16 × 16 in this order.
And then 4 times of upsampling are carried out, wherein each time of upsampling operation comprises convolution, a 3 multiplied by 3 convolution kernel is adopted, and the number of channels is converted into half of the original number, so that the number of channels of the image is changed into 1024, 512, 256 and 128 in sequence. After each convolution, nonlinear features are obtained through Relu activation functions. The picture size is changed to one time by linear interpolation, namely, the picture size is changed to 32 × 32, 64 × 64, 128 × 128 and 256 × 256 in sequence.
And meanwhile, the input also enters a DGPA module to extract the relevant characteristics of the global pixel points and is spliced with the output result of the DGPA network. And the result of the first three times of down sampling enters an addition attention network to suppress irrelevant backgrounds and then is spliced with the result of the first three times of up sampling.
In the DGPA of the depth deformable convolution global attention module, a global attention network of deformable convolution is adopted to extract global pixel point relevant characteristics, and the compensated high-dimensional characteristics I of the MRI image are input into a 3 multiplied by 3 deformable convolution kernel in the DGPA module
Figure GDA0002955284490000089
Obtaining an output O:
Figure GDA0002955284490000081
inputting the compensated high-dimensional features I of the MRI image into three 1 x 1 convolution kernels in a spatial attention module to generate a new feature map
Figure GDA0002955284490000082
N is the number of channels of the feature map, M is H × W, i.e. the number of pixels of the feature map, and H and W are the height and width of the feature map, respectively; performing matrix multiplication on C and B after the conversion, and obtaining a space attention diagram according to a softmax formula by using a result
Figure GDA0002955284490000083
Wherein, the calculation method of each element of S is as follows:
Figure GDA0002955284490000084
in the formula, sjiIs an element of the ith row and the jth column in S, BiFor the ith row of the feature map B,
Figure GDA0002955284490000088
for the transposed j-th column of the feature map C, the spatial attention map S is matrix-multiplied with the feature map D, and the result of the computation and the result O of the previous deformable convolution are added to obtain the final result
Figure GDA0002955284490000085
Figure GDA0002955284490000086
Wherein α is a weight coefficient, DiIs line i of the feature map D, OjIs the jth column of output O.
In the deep deformable convolution global attention module DGPA, a summation attention module is used for suppressing irrelevant background; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1intxW x H, wherein FintThe dimension parameters in the preset U-Net network are obtained; adding the two obtained matrixes point by point, passing through a Relu activation layer, changing the dimension of the result to be 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then obtaining a weight coefficient alpha through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map. The output result can be normalized to between 0 and 1, and the magnitude of the value represents the magnitude of the correlation between this point and the recognition result. The larger the description, the more likely the recognition object is contained therein. Therefore, irrelevant areas are suppressed, and the accuracy of image recognition is improved.
In model training, a cross entropy function is adopted as a Loss function of a network, and a calculation formula of the cross entropy Loss _ seg is as follows:
Figure GDA0002955284490000091
wherein M represents the number of categories, ycIs a one-hot vector, pcIs the probability that the network model predicts belongs to sample c;
updating the weight parameter theta by adopting a standard Adam optimizer gradient descent, wherein the formula is as follows:
Figure GDA0002955284490000092
where η is the learning rate, θkIs the weight parameter for the k-th time.
The image detection module is used for segmenting a 3D ventricle area, the probability heat map of the 3D ventricle MRI image of the test set is calculated by using a trained network, the probability heat map corresponding to each ventricle MRI image is segmented according to the segmentation probability obtained by DGPA, and segmentation results, namely the left ventricle area, the myocardial area and the right ventricle area, are obtained.
In a specific case of applying the system of this embodiment, as shown in fig. 5, firstly, the acquired 3D ventricular MRI dataset is divided into a training set and a testing set, a deep space-time deformable convolution fusion module TDAM is constructed by using a U-Net network to obtain an offset of a target map in a 3D ventricular MRI video band, the obtained offset is fused into the target map by deformable convolution, and then the target map is input into a deep deformable convolution global attention module DGPA to extract global pixel point related features, and irrelevant portions are suppressed by using an attention adding network, so as to obtain a segmentation result map, thereby realizing accurate segmentation of the patient ventricle in the 3D ventricular MRI video image, and finally, the Dice of the whole video segmentation result is 90.1%, and compared with the existing segmentation algorithm based on a deep neural network, left ventricle, myocardium and right ventricle regions segmented by the system are more consistent with visual evaluation, the accuracy and efficiency are kept high.
The present invention is not limited to the above-described preferred embodiments. Any person can derive various other types of epileptogenic focus positioning systems based on deep learning according to the teaching of the present invention, and all equivalent changes and modifications made according to the application scope of the present invention shall fall within the scope of the present invention.

Claims (9)

1. A3D ventricle magnetic resonance video segmentation system based on deep learning is characterized by comprising a 3D ventricle magnetic resonance MRI video data preprocessing module, a Deformable convolution depth attention network Deformable U-Net and an image detection module;
the 3D ventricular magnetic resonance MRI video data preprocessing module comprises a data enhancement module and a data division module:
the data enhancement module: splitting the existing 3D ventricular MRI video data set into MRI images of each frame, expanding the data set and carrying out normalization processing on the image size;
the data dividing module: dividing the enhanced image data into a training set and a testing set; the training set and the test set both comprise complete 3D ventricular MRI images, and the training set is used for training a Deformable convolution depth attention network Deformable U-Net;
the Deformable convolution depth attention network Deformable U-Net comprises a depth space-time Deformable convolution fusion module TDAM and a depth Deformable convolution global attention module DGPA:
the depth space-time deformable convolution fusion module TDAM: the module comprises a U-Net network and a deformable convolution layer; the TDAM inputs each frame image of the continuous 3D ventricular MRI video on a time axis into a U-Net network, outputs the image as a high-dimensional characteristic compensation area offset field of the image in the MRI video segment, transmits the high-dimensional characteristic compensation area offset field and the input image into a deformable convolution layer, and calculates to obtain high-dimensional characteristic fused feature maps of the compensated image, namely the fused feature containing the information of the front frame and the rear frame;
the deep deformable convolution global attention module DGPA: the module comprises a U-Net network, a deformable convolution attention module, three summation attention modules and an output layer, wherein the deformable convolution attention module passes through a deformable convolution layer on the basis of a spatial attention module, and adds the output of the deformable convolution layer obtained through calculation with the output of the spatial attention module to finally obtain the output of the deformable convolution attention module; adding a deformable convolution attention module in a first layer skip connection of the U-Net network, and adding a sum attention module in the other three layers of skip connections; inputting the compensated high-dimensional features of the image into a U-Net network, transmitting the high-dimensional features calculated by the U-Net network into an output layer to obtain an attention feature map, and then obtaining a segmentation probability by adopting a softmax regression function, namely the probability that a certain region in the MRI image belongs to the left ventricle, the myocardium or the right ventricle;
the image detection module is used for segmenting a 3D ventricle area, the probability heat map of the 3D ventricle MRI image of the test set is calculated by using a trained network, the probability heat map corresponding to each ventricle MRI image is segmented according to the segmentation probability obtained by DGPA, and segmentation results, namely the left ventricle area, the myocardial area and the right ventricle area, are obtained.
2. The deep learning-based 3D ventricular MRI video segmentation system according to claim 1, wherein the data enhancement module expands the data set by rotation, contrast adjustment and scaling during image processing to divide the 3D ventricular MRI video into four directions of x, y, z and t, wherein x, y and z represent a spatial coordinate system, t represents a time axis, and video frames of an x-y plane are selected.
3. The 3D ventricular MRI video segmentation system based on deep learning of claim 1, characterized in that before each frame of MRI image is inputted into the Deformable U-Net network, the previous and subsequent r frames of the target frame along the time t axis direction are selected as the Deformable U-Net network input, i.e. 2r +1 frame of MRI image.
4. The deep learning based 3D ventricular MRI video segmentation system of claim 1, wherein the depth is spatiotemporally variableThe shape convolution fusion module TDAM comprises 9 layers of structures, the first layer comprises a convolution layer and a Relu function, and the input channel number (2r +1) × in _ c is converted into nf, wherein in _ c is the input image channel number, and nf is the user-defined convolution layer output channel number; layers 2 to 4 are downsampling structures comprising two convolutional layers and two Relu functions; the 5 th to 6 th layers are up-sampling structures and comprise a convolution layer, an anti-convolution layer and two Relu functions; the 7 th layer is a skip transfer structure, and the features obtained by down sampling are processed and then fused with an up sampling result, wherein the skip transfer structure comprises two convolution layers, an anti-convolution layer and three Relu functions; the 8 th layer is an offset output structure and comprises two convolution layers and a Relu function, and the second convolution layer outputs the channel number (2r +1) × 2 × (defem _ ks)2Where deform _ ks is the deformable convolution kernel size; the ninth layer structure is a deformable convolution and a Relu function, and the input image and the offset are used as the layer input to obtain the high-dimensional characteristic fused feature maps of the image.
5. The deep learning-based 3D ventricular nuclear magnetic resonance video segmentation system according to claim 1, wherein in the depth spatiotemporal deformable convolution fusion module TDAM, the computation process of the convolution layer is as follows:
Figure FDA0002955284480000021
wherein convoutOutputting the image size, conv, for the convolution layerinIs the input image size, padding indicates that pixels are filled around the image, kennel _ size is the convolution kernel size, stride is the step size of the convolution kernel;
the 3 × 3 convolution kernel R is defined as: r { (-1, -1), (-1, 0), (0, 1), (1, 1) }, characteristic y (p) of the convolutional layer0) Comprises the following steps:
Figure FDA0002955284480000022
wherein p isnIs the position in R, w (-) is the weight, x (-) is the feature of the input image, p0Is an initial position;
in the deformable convolution layer, the convolution kernel R is offset { Δ p }n1, ·, N } enhanced, where N ═ R |; thus, the feature y' (p) of the deformable convolution0) Comprises the following steps:
Figure FDA0002955284480000023
in the formula,. DELTA.pnAnd compensating the region for the high-dimensional characteristics obtained by the U-Net network.
6. The deep learning-based 3D ventricular MRI video segmentation system according to claim 1, wherein in the deep spatiotemporal Deformable convolution fusion module TDAM, each frame of 3D ventricular MRI image transmitted into Deformable U-Net network is
Figure FDA0002955284480000031
The calculation formula of the fusion feature output by the TDAM is as follows:
Figure FDA0002955284480000032
where F (k) is the result characteristic, S is the convolution kernel size,
Figure FDA0002955284480000033
is the core of the h-th channel,
Figure FDA0002955284480000034
is an image of the h-th channel, h0For the current channel, k is the arbitrary spatial position, ksSampling an offset for a deformable convolution; providing additional learnable in TDAM
Figure FDA0002955284480000035
So that
ks←ks(h,k),s
Wherein, delta(h,k)Learned offset, delta, for the h-th channel of spatial position k(h,k),sThe sample offset for the learned offset.
7. The deep learning based 3D ventricular MRI video segmentation system according to claim 1, wherein the DGPA can extract global pixel point related features; the high-dimensional features I of the compensated MRI image are input to a 3 x 3 deformable convolution kernel in a deformable convolution attention module
Figure FDA00029552844800000313
Obtaining an output O:
Figure FDA0002955284480000036
inputting the compensated high-dimensional features I of the MRI image into three 1 x 1 convolution kernels in a spatial attention module to generate a new feature map
Figure FDA0002955284480000037
N is the number of channels of the feature map, M is H × W, i.e. the number of pixels of the feature map, and H and W are the height and width of the feature map, respectively; performing matrix multiplication on C and B after the conversion, and obtaining a space attention diagram according to a softmax formula by using a result
Figure FDA0002955284480000038
Wherein, the calculation method of each element of S is as follows:
Figure FDA0002955284480000039
in the formula, sjiIs an element of the ith row and the jth column in S, BiFor the ith row of the feature map B,
Figure FDA00029552844800000310
for the transposed j-th column of the feature map C, the spatial attention map S is matrix-multiplied with the feature map D, and the result of the computation and the result O of the previous deformable convolution are added to obtain the final result
Figure FDA00029552844800000311
Figure FDA00029552844800000312
Wherein α is a weight coefficient, DiIs line i of the feature map D, OjIs the jth column of output O.
8. The deep learning based 3D ventricular nuclear magnetic resonance video segmentation system of claim 7, characterized in that in the deep deformable convolution global attention module DGPA, a sum attention module is used to suppress irrelevant background; g is a characteristic diagram with the dimension of the down-sampling stage being NxWxH, X is a characteristic diagram with the dimension of the up-sampling stage being NxWxH, and the two characteristic diagrams respectively change the dimension to F after passing through a convolution kernel of 1X 1intxW x H, wherein FintThe dimension parameters in the preset U-Net network are obtained; adding the two obtained matrixes point by point, passing through a Relu activation layer, changing the dimension of the result to be 1 multiplied by W multiplied by H after passing through a 1 multiplied by 1 convolution kernel, and then obtaining a weight coefficient alpha through a sigmoid function; the input X is multiplied by a weight coefficient alpha to obtain an attention feature map.
9. The deep learning-based 3D ventricular nuclear magnetic resonance video segmentation system according to claim 1, wherein the Deformable U-Net adopts a cross entropy function as a Loss function of the network during the training process, and the cross entropy Loss _ seg is calculated by the following formula:
Figure FDA0002955284480000041
wherein M represents the number of categories, ycIs a one-hot vector, pcIs the probability that the network model predicts belongs to sample c;
updating a weight parameter theta of Deformable U-Net by adopting a standard Adam optimizer gradient descent, wherein the formula is as follows:
Figure FDA0002955284480000042
where η is the learning rate, θkIs the weight parameter for the k-th time.
CN202010622947.1A 2020-07-01 2020-07-01 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning Active CN111932550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010622947.1A CN111932550B (en) 2020-07-01 2020-07-01 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010622947.1A CN111932550B (en) 2020-07-01 2020-07-01 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning

Publications (2)

Publication Number Publication Date
CN111932550A CN111932550A (en) 2020-11-13
CN111932550B true CN111932550B (en) 2021-04-30

Family

ID=73316977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010622947.1A Active CN111932550B (en) 2020-07-01 2020-07-01 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning

Country Status (1)

Country Link
CN (1) CN111932550B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330683B (en) * 2020-11-16 2022-07-29 的卢技术有限公司 Lineation parking space segmentation method based on multi-scale convolution feature fusion
CN112733672B (en) * 2020-12-31 2024-06-18 深圳一清创新科技有限公司 Three-dimensional target detection method and device based on monocular camera and computer equipment
CN112766195B (en) * 2021-01-26 2022-03-29 西南交通大学 Electrified railway bow net arcing visual detection method
CN113436139A (en) * 2021-05-10 2021-09-24 上海大学 Small intestine nuclear magnetic resonance image identification and physiological information extraction system and method based on deep learning
CN113283529B (en) * 2021-06-08 2022-09-06 南通大学 Neural network construction method for multi-modal image visibility detection
CN114004847B (en) * 2021-11-01 2023-06-16 中国科学技术大学 Medical image segmentation method based on graph reversible neural network
CN114155208B (en) * 2021-11-15 2022-07-08 中国科学院深圳先进技术研究院 Atrial fibrillation assessment method and device based on deep learning
CN114359310B (en) * 2022-01-13 2024-06-04 浙江大学 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning
CN116612131B (en) * 2023-05-22 2024-02-13 山东省人工智能研究院 Cardiac MRI structure segmentation method based on ADC-UNet model
CN116630628B (en) * 2023-07-17 2023-10-03 四川大学 Aortic valve calcification segmentation method, system, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447008A (en) * 2018-11-02 2019-03-08 中山大学 Population analysis method based on attention mechanism and deformable convolutional neural networks
CN109685813A (en) * 2018-12-27 2019-04-26 江西理工大学 A kind of U-shaped Segmentation Method of Retinal Blood Vessels of adaptive scale information
CN110120033A (en) * 2019-04-12 2019-08-13 天津大学 Based on improved U-Net neural network three-dimensional brain tumor image partition method
CN110163876A (en) * 2019-05-24 2019-08-23 山东师范大学 Left ventricle dividing method, system, equipment and medium based on multi-feature fusion
CN111161273A (en) * 2019-12-31 2020-05-15 电子科技大学 Medical ultrasonic image segmentation method based on deep learning
CN111275755A (en) * 2020-04-28 2020-06-12 中国人民解放军总医院 Mitral valve orifice area detection method, system and equipment based on artificial intelligence

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260651B (en) * 2018-11-30 2023-11-10 西安电子科技大学 Stomach low-quality MRI image segmentation method based on deep migration learning
CN111192245B (en) * 2019-12-26 2023-04-07 河南工业大学 Brain tumor segmentation network and method based on U-Net network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447008A (en) * 2018-11-02 2019-03-08 中山大学 Population analysis method based on attention mechanism and deformable convolutional neural networks
CN109685813A (en) * 2018-12-27 2019-04-26 江西理工大学 A kind of U-shaped Segmentation Method of Retinal Blood Vessels of adaptive scale information
CN110120033A (en) * 2019-04-12 2019-08-13 天津大学 Based on improved U-Net neural network three-dimensional brain tumor image partition method
CN110163876A (en) * 2019-05-24 2019-08-23 山东师范大学 Left ventricle dividing method, system, equipment and medium based on multi-feature fusion
CN111161273A (en) * 2019-12-31 2020-05-15 电子科技大学 Medical ultrasonic image segmentation method based on deep learning
CN111275755A (en) * 2020-04-28 2020-06-12 中国人民解放军总医院 Mitral valve orifice area detection method, system and equipment based on artificial intelligence

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Attention U-Net: Learning Where to Look for the Pancreas;Ozan Oktay等;《arXiv:1804.03999v3 [cs.CV]》;20180520;全文 *
Deformable Convolutional Networks;Jifeng Dai等;《arXiv:1703.06211v3 [cs.CV]》;20170605;全文 *
DUNet: A deformable network for retinal vessel segmentation;Qiangguo Jin等;《JOURNAL OF LATEX CLASS FILES》;20150831;第14卷(第8期);全文 *
基于MRI图像的左心室分割方法研究现状与发展;周钦等;《计算机工程与应用》;20191231;第55卷(第2期);全文 *

Also Published As

Publication number Publication date
CN111932550A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN111932550B (en) 3D ventricle nuclear magnetic resonance video segmentation system based on deep learning
CN111145170B (en) Medical image segmentation method based on deep learning
CN110390351B (en) Epileptic focus three-dimensional automatic positioning system based on deep learning
CN107492071B (en) Medical image processing method and equipment
US11430140B2 (en) Medical image generation, localizaton, registration system
US20200167929A1 (en) Image processing method, image processing apparatus, and computer-program product
CN114359310B (en) 3D ventricular nuclear magnetic resonance video segmentation optimization system based on deep learning
CN110930416A (en) MRI image prostate segmentation method based on U-shaped network
CN110599528A (en) Unsupervised three-dimensional medical image registration method and system based on neural network
CN111444896A (en) Method for positioning human meridian key points through far infrared thermal imaging
CN111951288A (en) Skin cancer lesion segmentation method based on deep learning
CN111161271A (en) Ultrasonic image segmentation method
CN110648331A (en) Detection method for medical image segmentation, medical image segmentation method and device
CN116258933A (en) Medical image segmentation device based on global information perception
CN116258732A (en) Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images
CN117392312A (en) New view image generation method of monocular endoscope based on deformable nerve radiation field
CN114119635B (en) Fatty liver CT image segmentation method based on cavity convolution
CN117523204A (en) Liver tumor image segmentation method and device oriented to medical scene and readable storage medium
CN113269774A (en) Parkinson disease classification and lesion region labeling method of MRI (magnetic resonance imaging) image
CN116758087A (en) Lumbar vertebra CT bone window side recess gap detection method and device
CN116757982A (en) Multi-mode medical image fusion method based on multi-scale codec
CN116309754A (en) Brain medical image registration method and system based on local-global information collaboration
CN115424319A (en) Strabismus recognition system based on deep learning
CN117274282B (en) Medical image segmentation method, system and equipment based on knowledge distillation
Sarkera et al. MobileGAN: Skin Lesion Segmentation Using a Lightweight Generative Adversarial Network [J]

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant