CN115239740A - GT-UNet-based full-center segmentation algorithm - Google Patents

GT-UNet-based full-center segmentation algorithm Download PDF

Info

Publication number
CN115239740A
CN115239740A CN202210645929.4A CN202210645929A CN115239740A CN 115239740 A CN115239740 A CN 115239740A CN 202210645929 A CN202210645929 A CN 202210645929A CN 115239740 A CN115239740 A CN 115239740A
Authority
CN
China
Prior art keywords
convolution
dimensional
encoder
segmentation
multiplied
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210645929.4A
Other languages
Chinese (zh)
Inventor
田沄
刘彬
李岩松
赵世凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN202210645929.4A priority Critical patent/CN115239740A/en
Publication of CN115239740A publication Critical patent/CN115239740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a GT-UNet-based full-center segmentation algorithm, which comprises the following steps: preprocessing input three-dimensional multi-modality cardiac images (including CT and MRI); converting the preprocessed data into a plurality of mutually independent slices, conveying the slices to a two-dimensional segmentation network for training, and outputting a class probability map a; cutting the preprocessed data into a plurality of independent data volumes, transmitting the data into a three-dimensional segmentation network for training, and outputting a class probability graph b; sending the class probability map a and the class probability map b into a fusion module, comparing the class probability map a and the class probability map b pixel by pixel, and performing full-center segmentation by using the maximum probability class; the whole-center segmentation algorithm adaptively adjusts the size of a receptive field according to input, and effectively utilizes global information to perform remote modeling, so that the segmentation precision of the algorithm is effectively improved.

Description

Full-center segmentation algorithm based on GT-UNet
Technical Field
The invention relates to the technical field of medical image processing, in particular to a GT-UNet-based full-center segmentation algorithm.
Background
The automatic segmentation of the whole heart is taken as an important step for quantitatively evaluating and quantitatively diagnosing the heart disease of the heart structure, the complete region and the edge of the heart can be accurately extracted, and then a heart three-dimensional model is established to assist a doctor in subsequent clinical diagnosis and treatment, so that the method has important application value and clinical significance for cardiac operation navigation, interventional therapy guidance, computer-aided diagnosis and the like.
Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are common Imaging diagnosis methods for heart diseases, although doctors can obtain anatomical information of the internal structure of the heart from an Imaging examination slice of a patient, which is helpful for performing non-invasive quantitative assessment on heart function subsequently, but also greatly increases the workload of doctors invisibly, the traditional image segmentation method is manual reading, and then a radiologist manually segments a boundary by using professional software. In order to reduce the heavy workload of imaging doctors and improve the segmentation precision of cardiac structures, the research of automatic image segmentation and diagnosis by computer-aided doctors is not slow.
Based on the technical problems of the existing full-heart segmentation technology, the invention provides a full-heart segmentation algorithm based on GT-UNet.
Disclosure of Invention
The invention provides a GT-UNet based full-center segmentation algorithm.
The invention adopts the following technical scheme:
a Graph-Reasoning and transform-module based (GT-UNet) full-heart segmentation algorithm, comprising:
step 1, preprocessing input three-dimensional multi-modal cardiac images including CT and MRI;
step 2, converting the preprocessed data into a plurality of mutually independent slices, conveying the slices to a two-dimensional segmentation network for training, and outputting a class probability map a;
step 3, cutting the preprocessed data into a plurality of independent data volumes, transmitting the data volumes to a three-dimensional segmentation network for training, and outputting a class probability chart b;
and 4, sending the class probability map a and the class probability map b into a fusion module, comparing the class probability map a and the class probability map b pixel by pixel, performing full-center segmentation by using the maximum probability class, and outputting a segmentation result.
Further, in step 1, the pretreatment comprises:
step 1.1, cutting a Region of interest (ROI);
step 1.2, resampling and normalizing the ROI.
Further, converting the preprocessed data into a plurality of mutually independent slices, and conveying the slices to a two-dimensional segmentation network for training comprises the following steps:
step 2.1, obtaining a mapping function f (-) representing a characteristic linear combination, and enabling an input characteristic diagram X epsilon R of an original coordinate space omega L×C Mapping into an interaction space H through a mapping function f (·), and obtaining a new feature V = f (X) epsilon R N×C Wherein, N is the node number in H space, C is the expected feature dimension, and the feature calculation formula is:
Figure BDA0003684098070000021
wherein, b i ∈R 1×L Is a learnable mapping weight, x j ∈R 1×C ,v i ∈R 1×C ,b ij Is a binary combination generated by convolution operation, and takes the value of 0 or 1;
step 2.2, using Graph Convolution Network (GCN) to perform inference to obtain a full connectivity Graph storing new features, and performing inference by learning interactive edge weight corresponding to each node, wherein the definition of the single-layer Graph Convolution network is as follows:
Z=GVW g =((I-A g )V)W g ∈R N×C ……(2),
wherein A is g And G denotes a node adjacency matrix of NxN size, A g Is randomly initialized and learned in the training process, I is an identity matrix, Z belongs to R N×C For nodes that are globally inferred, W g The state of each node is updated at the time of a state updating function;
step 2.3, projecting the node Z in the interaction space H to an original coordinate space omega, and enabling a reverse mapping function Y = g (Z) to be epsilon to R L×C Can be obtained from equation 3:
Figure BDA0003684098070000022
wherein, y i ∈R 1×C Is a learnable inverse mapping weight, d ij As a weighted scalar, Z j Representing the jth inference node.
Further, in step 2.1, by converting the function f (X)
Figure BDA0003684098070000031
And B = θ (X; W) θ ) To reduce the input dimension, where B = [ B ] 1 ,…,b N ]∈R N×L In order to map the weights, the weights are,
Figure BDA0003684098070000032
and θ (X) is two convolution layers, W θ And
Figure BDA0003684098070000033
is a learnable convolution kernel for each layer.
Further, in step 3, the three-dimensional segmentation network includes: CNN encoder F for extracting multi-scale feature maps from input images CNN (. The), deTrans encoder that processes the attention multiscale feature map embedded with position coding in an end-to-end manner, CNN decoder for generating and feeding the DeTrans encoder to decoder segmentation.
Further, the CNN encoder F CNN (. H) contains a Conv-In-Relu module and a three-stage Resnet module, where the Conv-In-Relu module first performs a convolution operation with a convolution kernel of 7 × 7 × 7 × 064 and a step size of (1, 2), followed by example normalization and ReLu processing; then, the residual error data is sent to a Resnet module in the first stage, the residual error data comprises three residual error units, residual error operation with the step size of (2, 2) and the convolution kernel of 3 multiplied by 13 multiplied by 23 multiplied by 3192 is firstly executed, then residual error operation with the step size of (1, 1) and the convolution kernel of 3 multiplied by 43 multiplied by 53 multiplied by 6192 is carried out twice, 192 characteristic diagrams with the size of 48 multiplied by 740 multiplied by 840 are obtained, and the characteristic diagrams are sent to a Resnet module in the second stage; except that the number of convolution kernels is updated to 384 from 192 by the Resnet module, the other parameters are the same as those of the first stage, 384 characteristic diagrams with the size of 24 × 20 × 20 are finally obtained, the characteristic diagrams are sent to the Resnet module of the third stage, two residual error units are arranged in the Resnet module, the residual error operation with the step size of (2, 2) and the convolution kernel of 3 × 3 × 3 × 384 is firstly executed, and then the residual error operation with the step size of (1, 1) and the convolution kernel of 3 × 3 × 3 × 384 is executed; by F CNN The definition formula of the generated characteristic diagram is as follows:
Figure BDA0003684098070000034
where L denotes the number of feature layers, L is a specific layer, x is an input feature map, Θ denotes parameters required by the encoder, C denotes the number of channels, H denotes the height of the input image, W denotes the width of the input image, and D denotes the depth of the input data, i.e., the number of slices.
Further, the DeTrans encoder comprises a sequence layer for converting the input image and a plurality of stacked deformable DeTrans layers, and the DeTrans encoder is used for generating the feature map generated by the CNN encoder
Figure BDA0003684098070000035
Flattening the image into a one-dimensional image patch sequence, and embedding a three-dimensional fixed position coding sequence into the flattened one-dimensional sequence
Figure BDA0003684098070000036
To capture sequences of relative or absolute positions between various substructures of the heart.
Furthermore, the CNN decoder includes four upsampling modules, each of the first three upsampling modules includes a convolutional layer having a step size of 2 × 2 × 2 and a convolutional kernel of 2 × 2 × 2, the number of the corresponding convolutional kernels is 384, 192, and 64, the convolutional layer is followed by a three-dimensional residual block to refine the feature map, and then the feature map output by the encoder and the feature map obtained after the transposed convolution are connected by jumping to sum pixel by pixel, so as to retain more low-layer information; the final upsampling module consists of one upsampling layer and one 1 x 1 convolutional layer, mapping the feature maps of 64 channels into the desired number of classes.
Compared with the prior art, the invention has the following advantages:
the GT-UNet based full-center segmentation algorithm can effectively capture the global relationship and is suitable for different heart data sets, wherein the graph reasoning unit captures the global relationship by projecting the characteristics into an interactive space to carry out relationship reasoning, the Transformer module can overcome induction deviation of convolution operation and inherent limitation of local sensitivity, the size is adjusted in a self-adaptive mode according to input, and the global information is effectively utilized for remote modeling, so that the segmentation precision of the algorithm is effectively improved.
Drawings
FIG. 1 is a flowchart of a GT-UNet based full-center segmentation algorithm in an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments, it being understood that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.
As shown in fig. 1, the GT-UNet based full-center segmentation algorithm includes:
step 1, preprocessing input three-dimensional multi-modal cardiac images including CT and MRI;
step 2, converting the preprocessed data into a plurality of mutually independent slices, conveying the slices to a two-dimensional segmentation network for training, and outputting a class probability map a;
step 3, cutting the preprocessed data into a plurality of independent data volumes, transmitting the data volumes to a three-dimensional segmentation network for training, and outputting a class probability map b;
and 4, sending the class probability map a and the class probability map b into a fusion module, comparing the class probability map a and the class probability map b pixel by pixel, performing full-center segmentation by using the maximum probability class, and outputting a segmentation result.
Specifically, in step 1, a non-zero template is generated according to an input image, and clipping is performed according to the size and the position of a boundary frame; resampling, wherein the xy plane adopts third-order spline interpolation, and the z axis adopts a nearest neighbor interpolation method; normalization using the z-score method;
in step 2, the convolutional neural network is good at extracting local relations, but is very weak in capturing global relations, and a multilayer superposition is usually required to achieve an expected effect, so that the difficulty and cost of global reasoning of the CNN are increased sharply, and a global modeling and reasoning are generally used to benefit a global segmentation task, so in the embodiment, a global semantic reasoning unit based on graph convolution is added in the two-dimensional segmentation network;
the two-dimensional convolution segmentation network specifically comprises:
step 2.1, obtaining a mapping function f (-) representing a characteristic linear combination, and enabling an input characteristic diagram X epsilon R of an original coordinate space omega L×C Mapping into an interaction space H through a mapping function f (·), and obtaining a new feature V = f (X) epsilon R N×C Wherein N is the number of nodes in the H space, C is the expected feature dimension, and the feature calculation formula is as follows:
Figure BDA0003684098070000051
wherein, b i ∈R N×L Is a learnable mapping weight, x j ∈R 1×C ,v i ∈R 1×C ,b ij Is a binary generated using a convolution operationPreparing a combination, wherein the value is 0 or 1;
step 2.2, using Graph Convolution Network (GCN) to carry out reasoning to obtain a full connected graph storing new characteristics, and reasoning by learning interactive edge weight corresponding to each node, wherein the definition of the single-layer graph convolution network is as follows:
Z=GVW g =((I-A g )V)W g ∈R N×C
wherein, A g And G denotes a node adjacency matrix of NxN size, A g Is randomly initialized and learned in the training process, I is an identity matrix, and Z belongs to R N×C For nodes subject to global reasoning, W g The state of each node is updated at the time of a state updating function;
step 2.3, projecting the node Z in the interaction space H to an original coordinate space omega, and reversely mapping a function Y = g (Z) epsilon R L×C Can be obtained from equation 3:
Figure BDA0003684098070000052
wherein, y i ∈R 1×C Is a learnable inverse mapping weight, d ij As a weighted scalar, Z j Representing the jth inference node.
As an improvement of this embodiment, in order to further reduce the algorithm input dimension, the function f (X) is converted into
Figure BDA0003684098070000053
And B = θ (X; W) θ ) Wherein B = [ B ] 1 ,…,b N ]∈R N×L In order to map the weights, the weights are,
Figure BDA0003684098070000054
and θ (X) is two convolution layers, W θ And
Figure BDA0003684098070000055
is a learnable convolution kernel for each layer.
In step 2.3, the network takes a CNN encoder-decoder as a basic framework, a Transformer-based deformable encoder (DeTrans) is inserted for modeling and analyzing the long-distance dependency relationship, the network mainly comprises a CNN encoder, the DeTrans encoder and a CNN decoder, wherein the CNN encoder extracts a multi-scale feature map from an input image, the DeTrans encoder processes an attention multi-scale feature map embedded with position encoding in an end-to-end manner, and the CNN decoder reconstructs the feature map;
wherein, CNN encoder F CNN <xnotran> (·) Conv-In-Relu Resnet , conv-In-Relu 7 × 7 × 7 × 064, (1,2,2) , ReLu , Resnet , , (2,2,2), 3 × 13 × 23 × 3192 , (1,1,1), 3 × 43 × 53 × 6192 , 192 48 × 740 × 840 , Resnet , Resnet 192 384, , 384 24 × 20 × 20 , Resnet , , (2,2,2), 3 × 3 × 3 × 384 , (1,1,1), 3 × 3 × 3 × 384 ; </xnotran> By F CNN The definition formula of the generated characteristic diagram is as follows:
Figure BDA0003684098070000061
wherein, L represents the number of feature layers, Θ represents a parameter required by an encoder, C represents the number of channels, H represents the height of an input image, W represents the width of the input image, and D represents the depth of input data, i.e., the number of slices;
to overcome the inductive bias of convolution operations and the inherent limitations of locality sensitivity, the present inventionEmbodiments introduce a DeTrans encoder, the core point is a multi-scale deformable self-attention (MS-DMSA) mechanism for capturing remote pixel dependency, the DeTrans encoder is composed of a sequence layer of input image conversion and a plurality of stacked deformable DeTrans layers, while a Transformer can only process data in a sequence-to-sequence mode and does not contain circulation and convolution operations, and a characteristic diagram generated by a CNN encoder needs to be processed
Figure BDA0003684098070000062
Flattening is a one-dimensional image patch sequence, but direct operation inevitably loses some key spatial position relations, and a three-dimensional fixed position coding sequence needs to be embedded in the flattened one-dimensional sequence
Figure BDA0003684098070000063
To capture sequences of relative or absolute positions between various substructures of the heart;
in this embodiment, the wavelength is used to form a trigonometric function of a geometric progression from 2 pi to 10000 · 2 pi to calculate the coordinates of each dimension pos by the following specific calculation formula:
Figure BDA0003684098070000071
where pos represents the position, i is the dimension, # ∈ { D, H, W } represents the depth, height and width of the input image, respectively, and for each feature level l, PE needs to be assigned D ,PE H ,PE W Spliced together as a three-dimensional position code p l Then with the unfolded f l Adding element by element to obtain an input sequence of the DeTrans encoder;
the self-attention layer in the initial Transformer can check all possible spatial positions according to the size of the characteristic diagram, and the invention introduces a position module only focusing on key sampling points around a reference point, which is named as MS-DMSA, so that the parameter and calculation cost, z q ∈R C Is a characteristic representation of the query matrix q,
Figure BDA0003684098070000072
is a three-dimensional coordinate normalized by the reference point, when a multi-dimensional feature map extracted from the last L stages of the CNN encoder is given
Figure BDA0003684098070000073
The ith attention head is characterized by:
Figure BDA0003684098070000074
wherein K is the number of sampling key points ^ (z) q ) ilqk ∈[0,1]To focus on the weight, Δ pilqk ∈R 3 Indicating the sample offset, σ, of the kth sample point in the ith feature level l (. A) is to
Figure BDA0003684098070000076
Rescaled to the ith characteristic level, ^ (z) q ) ilqk And Δ pilqk Are all to the query feature z q The values of the parameters obtained by performing the linear projection, where the MS-DMSA layer is defined as:
Figure BDA0003684098070000075
h is the number of the attention heads, phi (·) represents a linear projection layer for weighting and aggregating the characteristics of all the attention heads, a DeTrans layer is composed of an MS-DMSA layer and a feedforward network, each layer adopts jump connection and carries out layer normalization, a DeTrans encoder is created by repeatedly stacking the DeTrans layers, and then an output sequence is formed into a characteristic diagram again according to the size of a three-dimensional scale;
in this embodiment, the CNN decoder includes four upsampling modules, each of the first three modules includes a convolutional layer having a step size of 2 × 2 × 2 and convolutional kernels of 2 × 2 × 2, the number of the convolutional kernels is 384, 192, and 64, respectively, the convolutional layers are followed by a three-dimensional residual block to refine the feature map, the feature map output by the encoder and the feature map obtained after performing the transpose convolution are subjected to pixel-by-pixel summation through jump connection, more low-layer information is retained, the last module is composed of one upsampling layer and one 1 × 1 convolutional layer, and the feature maps of 64 channels are mapped into a desired number of classes.
In step 4, the output results of the two sub-convolution networks are fused together by the fusion module, firstly, preprocessed data are converted into a plurality of mutually independent slices and are conveyed to the two-dimensional segmentation network for training, a category probability graph a is output, the preprocessed data are cut into a plurality of data volumes and are conveyed to the three-dimensional segmentation network for training, a label prediction probability graph b is output, then the two probability graphs are conveyed to the fusion module for pixel-by-pixel comparison, and final full-center segmentation is carried out according to the category with the maximum probability.
The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (8)

1. A GT-UNet based full-heart segmentation algorithm, comprising:
step 1, preprocessing an input three-dimensional multi-modal heart image;
step 2, converting the preprocessed data into a plurality of mutually independent slices, conveying the slices to a two-dimensional segmentation network for training, and outputting a class probability map a;
step 3, cutting the preprocessed data into a plurality of independent data volumes, transmitting the data volumes to a three-dimensional segmentation network for training, and outputting a class probability chart b;
and 4, sending the class probability map a and the class probability map b into a fusion module, comparing the class probability map a and the class probability map b pixel by pixel, and performing full-center segmentation by using the maximum probability class.
2. The GT-UNet based full-heart segmentation algorithm according to claim 1, wherein in step 1, the preprocessing comprises:
step 1.1, cutting ROI;
step 1.2, resampling and normalizing the ROI.
3. The GT-UNet based full-heart segmentation algorithm according to claim 1, wherein the step 2 of converting the preprocessed data into a plurality of independent slices and feeding the slices into the two-dimensional segmentation network for training comprises:
step 2.1, obtaining a mapping function T (-) representing the linear combination of the characteristics, and inputting the characteristic diagram X epsilon R of the original coordinate space omega L×C Mapping into an interaction space H through a mapping function f (·), and obtaining a new feature V = f (X) epsilon R N×C Wherein, N is the node number of H space, C is the expected feature dimension, and the feature calculation formula is:
Figure FDA0003684098060000011
wherein, b i ∈R 1×L Is a learnable mapping weight, x j ∈R 1×C ,v i ∈R 1×C ,b ij Is a binary combination generated by convolution operation, and takes the value of 0 or 1;
step 2.2, reasoning is carried out by using a graph convolution network GCN to obtain a full connected graph storing new characteristics, reasoning is carried out by learning interactive edge weights corresponding to each node, and the definition of the single-layer graph convolution network is as follows:
Z=GVW g =((I-A g )V)W g ∈R N×C ......(2),
wherein, A g And G represents a node adjacency matrix of size NxN, A g Is randomly initialized and learned in the training process, I is an identity matrix, Z belongs to R N×C For nodes that are globally inferred, W g A state update function, in which the state of each node is updated;
step 2.3, project node Z in interaction space H to original seatMarking a space omega, and mapping a function Y = g (Z) epsilon R reversely L×C Can be obtained from equation 3:
Figure FDA0003684098060000021
wherein, y i ∈R 1×C Is a learnable inverse mapping weight, d ij As a weighted scalar, Z j Representing the jth inference node.
4. The GT-UNet based full heart segmentation algorithm according to claim 3, wherein in step 2.1, by converting the function f (X)
Figure FDA0003684098060000022
And B = θ (X; W) θ ) To reduce the input dimension, where B = [ B ] 1 ,…,b N ]∈R N×L In order to map the weights, the weights are,
Figure FDA0003684098060000023
and θ (X) is two convolution layers, W θ And
Figure FDA0003684098060000024
is a learnable convolution kernel for each layer.
5. The GT-UNet based full-heart segmentation algorithm according to claim 1, wherein in step 3, the three-dimensional segmentation network comprises: a CNN encoder for extracting a multi-scale feature map from an input image, a retrans encoder for processing an attention multi-scale feature map embedded with position encoding in an end-to-end manner, a CNN decoder for generating and feeding the retrans encoder to decoder segmentation.
6. The GT-UNet based full-center segmentation algorithm of claim 5, wherein the CNN encoder F CNN (. C) contains a Conv-In-Relu module and three stages of RAn esnet module, wherein the Conv-In-Relu module first performs a convolution operation with a convolution kernel of 7 × 7 × 064 and a step size of (1, 2), followed by example normalization and ReLu processing; then, the residual error data is sent to a Resnet module in the first stage, the residual error data comprises three residual error units, residual error operation with the step size of (2, 2) and the convolution kernel of 3 multiplied by 13 multiplied by 23 multiplied by 3192 is firstly executed, then residual error operation with the step size of (1, 1) and the convolution kernel of 3 multiplied by 43 multiplied by 53 multiplied by 6192 is carried out twice, 192 characteristic diagrams with the size of 48 multiplied by 740 multiplied by 840 are obtained, and the characteristic diagrams are sent to a Resnet module in the second stage; except that the Resnet module updates the number of convolution kernels from 192 to 384, the other parameters are the same as those in the first stage, 384 characteristic diagrams with the size of 24 x 20 are finally obtained, the characteristic diagrams are sent to the Resnet module in the third stage, in the Resnet module, two residual units are provided, firstly, a residual operation with the step size of (2, 2) and the convolution kernel of 3 x 384 is executed, then, residual operation with the step size of (1, 1) and the convolution kernel of 3 multiplied by 384 is executed once; by F CNN (. The definition of the generated feature graph is as follows:
Figure FDA0003684098060000025
wherein L represents the number of feature layers, L is a specified layer, x is an input feature map, Θ represents parameters required by the encoder, C represents the number of channels, H represents the height of an input image, W represents the width of the input image, and D represents the depth of input data, i.e., the number of slices.
7. The GT-UNet based full-centric segmentation algorithm according to claim 5, wherein the DeTrans encoder comprises a sequence layer of input image conversion and a plurality of stacked deformable DeTrans layers, the DeTrans encoder is used for converting the feature map generated by the CNN encoder
Figure FDA0003684098060000026
Flattening the image into a one-dimensional image patch sequence, and embedding a three-dimensional fixed position coding sequence into the flattened one-dimensional sequence
Figure FDA0003684098060000031
To capture sequences of relative or absolute positions between various substructures of the heart.
8. The GT-UNet based full-heart segmentation algorithm according to claim 5, wherein the CNN decoder comprises four upsampling modules, each of the first three upsampling modules comprises a convolution layer with a step size of 2 × 2 × 2 and a convolution kernel of 2 × 2 × 2, the number of the corresponding convolution kernels is 384, 192, and 64, respectively, and the feature map is refined by a three-dimensional residual block, and then the feature map output by the encoder and the feature map obtained after the transposition convolution are subjected to pixel-by-pixel summation through jump connection, so as to retain more low-layer information; the final upsampling module consists of one upsampling layer and one 1 x 1 convolutional layer, mapping the feature maps of 64 channels into the desired number of classes.
CN202210645929.4A 2022-06-08 2022-06-08 GT-UNet-based full-center segmentation algorithm Pending CN115239740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210645929.4A CN115239740A (en) 2022-06-08 2022-06-08 GT-UNet-based full-center segmentation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210645929.4A CN115239740A (en) 2022-06-08 2022-06-08 GT-UNet-based full-center segmentation algorithm

Publications (1)

Publication Number Publication Date
CN115239740A true CN115239740A (en) 2022-10-25

Family

ID=83670140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210645929.4A Pending CN115239740A (en) 2022-06-08 2022-06-08 GT-UNet-based full-center segmentation algorithm

Country Status (1)

Country Link
CN (1) CN115239740A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333777A (en) * 2023-12-01 2024-01-02 山东元明晴技术有限公司 Dam anomaly identification method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333777A (en) * 2023-12-01 2024-01-02 山东元明晴技术有限公司 Dam anomaly identification method, device and storage medium
CN117333777B (en) * 2023-12-01 2024-02-13 山东元明晴技术有限公司 Dam anomaly identification method, device and storage medium

Similar Documents

Publication Publication Date Title
CN111476292B (en) Small sample element learning training method for medical image classification processing artificial intelligence
Hermessi et al. Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain
Zhuang et al. An Effective WSSENet-Based Similarity Retrieval Method of Large Lung CT Image Databases.
Khan et al. Deep neural architectures for medical image semantic segmentation
Birenbaum et al. Longitudinal multiple sclerosis lesion segmentation using multi-view convolutional neural networks
Zhang et al. Automatic skin lesion segmentation by coupling deep fully convolutional networks and shallow network with textons
WO2021186592A1 (en) Diagnosis assistance device and model generation device
CN115331769B (en) Medical image report generation method and device based on multi-mode fusion
CN112132878B (en) End-to-end brain nuclear magnetic resonance image registration method based on convolutional neural network
WO2020154562A1 (en) Method and system for automatic multiple lesion annotation of medical images
CN114792326A (en) Surgical navigation point cloud segmentation and registration method based on structured light
Fatemizadeh et al. Automatic landmark extraction from image data using modified growing neural gas network
CN115239740A (en) GT-UNet-based full-center segmentation algorithm
Vuppala et al. Explainable deep learning methods for medical imaging applications
Wong et al. The synergy of cybernetical intelligence with medical image analysis for deep medicine: a methodological perspective
Verma et al. Role of deep learning in classification of brain MRI images for prediction of disorders: a survey of emerging trends
CN114972291B (en) Medical image structured automatic labeling method and system based on hybrid enhanced intelligence
CN116309754A (en) Brain medical image registration method and system based on local-global information collaboration
CN114708952B (en) Image annotation method and device, storage medium and electronic equipment
CN116258732A (en) Esophageal cancer tumor target region segmentation method based on cross-modal feature fusion of PET/CT images
CN116091412A (en) Method for segmenting tumor from PET/CT image
CN117616467A (en) Method for training and using deep learning algorithm to compare medical images based on reduced dimension representation
Younisse et al. Fine-tuning U-net for medical image segmentation based on activation function, optimizer and pooling layer.
CN115151951A (en) Image similarity determination by analysis of registration
Wei et al. Multimodal Medical Image Fusion: The Perspective of Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination