CN116681894A - Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution - Google Patents

Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution Download PDF

Info

Publication number
CN116681894A
CN116681894A CN202310707529.6A CN202310707529A CN116681894A CN 116681894 A CN116681894 A CN 116681894A CN 202310707529 A CN202310707529 A CN 202310707529A CN 116681894 A CN116681894 A CN 116681894A
Authority
CN
China
Prior art keywords
convolution
layer
features
feature
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310707529.6A
Other languages
Chinese (zh)
Inventor
王蓉芳
牟钊汕
郝红侠
缑水平
焦昶哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310707529.6A priority Critical patent/CN116681894A/en
Publication of CN116681894A publication Critical patent/CN116681894A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A method, a system, equipment and a medium for segmenting adjacent layer feature fusion Unet multiple organs combined with large-kernel convolution are provided, wherein the method is implemented by constructing an adjacent layer feature fusion UNet multiple organ segmentation model combined with large-kernel-depth convolution; adding large-core depth convolution in an encoder network, and combining local information and global information; constructing adjacent layer feature fusion modules so that the model fully utilizes information among different layer features; constructing a large-core GRN channel response module, modeling long-distance dependency on features fused by features of adjacent layers, and under the condition that the number of feature channels is increased, carrying out global response normalization on the channels fused with the features by the large-core GRN channel response module, so as to compare and select the channels, and improving the segmentation performance of the whole model by utilizing the fused features; the system, the equipment and the medium can divide the organ image based on the dividing method; the method has the advantages of few use parameters, reduced complexity of organ segmentation, good practicality and high efficiency.

Description

Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution
Technical Field
The invention relates to the technical field of multi-organ segmentation of medical images, in particular to a method, a system, equipment and a medium for multi-organ segmentation of adjacent layer feature fusion Unet combined with large-kernel convolution.
Background
Organ segmentation in medical images is a fundamental task premise for many clinical applications, such as Computer Aided Diagnosis (CAD), computer Aided Surgery (CAS), etc. Automated and accurate segmentation of multiple organs is an important but challenging task for computer-aided diagnosis and image-guided surgery systems. Accurate segmentation is an important component of many clinical applications where contours of regions of interest are manually delineated by physicians, where manual delineation is cumbersome, time-consuming and laborious, and where organ structures and backgrounds are complex, organ boundaries are blurred, and thus manual delineation of organ contours on medical images is challenging. In recent years, with the development of deep learning in the field of medical imaging. The multi-organ medical image segmentation is performed by utilizing deep learning, so that the organ outline can be accurately drawn, the lesion area can be positioned, a doctor can be quickly assisted in positioning a target, the doctor pressure is reduced, the diagnosis efficiency is improved, and the method has positive and important effects on modern clinical application.
The related art schemes include a conventional multi-organ segmentation method, a conventional Machine Learning (ML) -based multi-organ segmentation method, and a Deep Learning (DL) -based multi-organ segmentation method.
Traditional segmentation methods are usually obtained by classical image segmentation methods, such as threshold segmentation, edge detection segmentation, region growth segmentation and the like, which involve manual work, target boundaries and mathematical models, and have poor segmentation results under the conditions of complex tissues and organs and fuzzy overlapping of boundaries. The multi-organ segmentation method based on traditional machine learning is driven by a machine learning algorithm, such as a segmentation method based on a map, which is to use priori knowledge to segment, and register an artificial segmentation or a predefined structure outline to a target image through marker propagation.
In recent years, deep learning technology has been greatly developed, and Convolutional Neural Networks (CNNs) have been successfully applied to medical image segmentation by virtue of strong feature extraction capability. Among the different CNN variants, the U-Net network model and variants based on it have been advanced medical segmentation models for many years due to their simple architecture and excellent performance. However, CNN-based models often suffer from limitations in capturing remote relationships due to the inherent locality of convolution operations. Recently, with the advent of Vision Transformer (ViT), particularly the introduction of Swin transducer (Swin-T), the ViT-based model has been made a medical segmentation backbone with its superior performance. The shift window scheme in Swin-T can overcome the limitations of high resolution input while preserving the global self-attention advantage of the transducer. While Swin-T reduces the complexity of the model, viT-based models are typically large in parameters, requiring more labeling samples and computational resources. Furthermore, in the field of semantic segmentation, it utilizes feature fusion of different layers to improve segmentation performance, and recent work has demonstrated this, such as PSPNet and HRNet for natural image segmentation, and unet++ and unet3+ for medical image segmentation. These studies indicate that fusing low-level features with more details and high-level features with more semantics can improve the segmentation performance of each field. However, these methods involve fusion of all features, which may lead to high computational complexity of the model, and these studies do only simple linear mapping of the fused features, which is not deep enough to extract the features.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method, a system, equipment and a medium for segmenting multiple organs by combining large-kernel convolution with adjacent layer feature fusion, wherein medical images can be effectively and efficiently segmented by utilizing large-kernel residual connection, adjacent layer feature fusion and large-kernel GRN channel response, good segmentation performance is realized with lower complexity and parameter quantity, and the method, the system, the equipment and the medium have the advantages of few use parameters, reduced complexity of organ segmentation, good practicability and high efficiency.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the adjacent layer feature fusion Unet multi-organ segmentation method combining large-kernel convolution is used for carrying out data set division, preprocessing data, data sampling and data enhancement on a sample containing a tag, then constructing an adjacent layer feature fusion UNet multi-organ segmentation model combining large-kernel depth convolution, and constructing an adjacent layer feature fusion module, so that the model fully utilizes information among different layer features to obtain lower layer features with more details and higher layer features with more semantics; and constructing a large-core GRN channel response module, modeling long-distance dependency on the characteristics fused by the characteristics of adjacent layers, and under the condition that the number of characteristic channels is increased, carrying out global response normalization on the channels fused with the characteristics by the large-core GRN channel response module, so that the channels are compared and selected, and the segmentation performance of the whole model is improved by utilizing the fused characteristics.
The constructing of the adjacent layer feature fusion UNet multi-organ segmentation model combined with large-kernel depth convolution comprises the following steps: constructing an encoder network, a base network of a decoder, an adjacent layer feature fusion module and a large core GRN channel response module; adding large-core depth convolution in the encoder network, combining the large-core depth convolution with 3×3 convolution, and combining local information and global information.
A method for segmenting multiple organs by combining large-kernel convolution and adjacent layer feature fusion Unet comprises the following specific steps:
s1, data set division
Randomly dividing a sample containing a label in the image data set into a training set and a testing set;
s2, data preprocessing
Resampling the influence data after the data set is divided to eliminate the difference between images with different sources, and facilitate the calculation and comparison of the characteristics in the images: resampling the 3D CT data to the same resolution, wherein image sample sampling uses bilinear interpolation and label sample uses adjacent interpolation;
the data is normalized, adverse effects caused by singular sample data are eliminated, and the convergence speed of network training is accelerated, and the formula is as follows:
wherein R represents CT data after normalization processing,wr and Hr represent the width and height of resolution after the CT data are subjected to normalization processing, zr represents the number of slices, I represents the CT value before the normalization processing, max (I) is the maximum CT value, and min (I) is the minimum CT value;
Centering on the target slice and stacking up and down slices as network inputs: firstly, selecting a target slice on the z axis of R after normalization processing, stacking adjacent slices by taking the target slice as a center, and taking the size of Wr multiplied by Hr multiplied by s as the input of a network, wherein s represents the number of adjacent slice stacks, if the number of stacked slices is insufficient, carrying out corresponding mirror filling, wherein the whole process is as follows:
assuming that i represents a slice at the z-axis position in R after normalization, the network input can be expressed as:
X=[i-1,i,i+1]
wherein X represents the input to the network,wr and Hr represent the wide-high of the original resolution of the CT data, when i=1, i.e. i is the first slice starting at the z-axis position in R, x= [1, i+1]The method comprises the steps of carrying out a first treatment on the surface of the When i=i max I.e. i is the last slice in z-axis position in R, x= [ i-1, i max ,i max ];
S3, data sampling
Sampling the data X obtained by the processing in the step S2, namely traversing X and sequentially sampling; the sequential sampling mode is to take slicesStep length is S, and sampling is sequentially performed on X from left to right and from top to bottom, wherein +.>H and W denote the width and height of the slice, respectively, if the slice +.>Which is outside the X size range during sampling, then +. >To withdraw samples, if->If H and W are greater than Wr and Hr of X, filling the mixture around X;
s4, data enhancement
Performing data augmentation to expand training data, avoiding the phenomenon of overfitting caused by training with fewer samples, and performing horizontal overturn, vertical overturn, -90 degrees and 90 degrees rotation and horizontal left-right translation on the data processed in the step S3 according to probability to realize data augmentation by data augmentation;
s5, constructing an adjacent layer feature fusion Unet multi-organ segmentation network combined with large-kernel depth convolution, and naming the network as ASF-LKUNet;
s6, training an ASF-LKUNet segmentation network constructed in the step S5;
s7, testing the optimal model obtained by training in the step S7 by using the test set in the step S1 and the data obtained by processing in the step S3, and quantitatively evaluating the segmentation performance of the model by using a DSC coefficient and a Haoskov distance (HD 95); wherein HD95 is a calculation of the distance between two sets, the smaller the value, the smaller the distance representing the two sets; the calculation mode is as follows:
wherein ,for tag feature map Y and partition feature map +.>Unidirectional Haosdorf distance between ++>For segmenting feature map->And the unidirectional Haoskov distance between the label characteristic diagram Y, and max (DEG) is calculated as Y and + >The distance between boundary points is sorted from small to large, and the sorting distance before 95% of sorting is taken;
the higher the DSC coefficient and the lower the HD95, the better the representative model segmentation performance, and thus the segmentation performance of the model is comprehensively evaluated.
S6, training an ASF-LKUNet segmentation network constructed in the step S5;
s7, testing the optimal model obtained by training in the step S7 by using the test set in the step S1 and the data obtained by processing in the step S3, and quantitatively evaluating the segmentation performance of the model by using a DSC coefficient and a Haoskov distance (HD 95); wherein HD95 is a calculation of the distance between two sets, the smaller the value, the smaller the distance representing the two sets; the calculation mode is as follows:
wherein ,for tag feature map Y and partition feature map +.>Unidirectional Haosdorf distance between ++>For segmenting feature map->And the unidirectional Haoskov distance between the label characteristic diagram Y, and max (DEG) is calculated as Y and +>The distance between boundary points is sorted from small to large, and the sorting distance before 95% of sorting is taken;
the higher the DSC coefficient and the lower the HD95, the better the representative model segmentation performance, and thus the segmentation performance of the model is comprehensively evaluated.
The specific process of the data sampling in the step S3 is as follows:
s301, firstly, calculating filling lengths of X in X and y axes, wherein the filling lengths are as follows:
P x =(S x -((H r -H)modS x )modS x
P y =(S y -((W r -W)modS y )modS y
wherein ,Px For a fill length X in the X-axis, P y Fill length for X in y-axis, S x and Sy Representing the step sizes in the x and y directions during sampling, S x and Sy Taking the same step length, mod table taking remainder, hr and Wr representing the width and height of the original resolution of CT data, W and H representing the sliceIs wide and tall, if->Hr and Wr of (A) are greater than X, then P is filled in the X and Y axes of X, respectively x and Py If H is greater than Hr, then filling P in X of X x If W is greater than Wr, then filling P in y of X y
S302, calculating the number of slices in the directions of the x axis and the y axis according to the step S301;
let the number of slices in the x-direction be N x The calculation method can be expressed as:
N x =(H r +P x -H)|S x +1
wherein, i represents integer division;
number of slices in y-direction N y The calculation method can be expressed as:
N y =(H r +P y -H)|S y +1
s303, according toAnd S calculates the coordinates, then the coordinates x' in the x direction can be expressed as:
x′=[x′ 1 ,x′ 2 ,...,x′ i ,....,x′ n ]
wherein ,x′i =(i-1)*S x I=1,..n, N represents the number of slices N in the x-direction x When i=n, x' n =Wr-W;
The coordinate y' in the y direction can be expressed as:
y′=[y′ 1 ,y′ 2 ,...,y′ i ,....,y′ n ]
wherein y′i =(i-1)*S y I=1,..n, N represents the number of slices N in the y direction y When i=n, y' n =Hr-H;
S304, slicingStep S, sequentially sampling from left to right and from top to bottom in the xy direction of X to obtain a slice B, wherein the specific process is as follows:
from the coordinates of X 'and y' of step S303, the upper left corner coordinates v to be sampled by X are obtained:
wherein ,n=Nx *N y
Positioned on X according to the coordinates v and slicedThe size of (2) is truncated to obtain a sampling slice B, and the formula is as follows:
B=[B 1 ,B 2 ,...,B n ]
wherein n=Nx *N y ,B i =X(v i ),B i Representing the collectionThe sample was taken to be the i-th slice,X(v i ) Expressed in X by the coordinate v i Positioning and then slicing->The size of (1) is truncated at X to obtain a sample slice.
The specific method of the S5 is as follows
S501, constructing a large-core residual error connection convolution module, wherein the large-core residual error connection convolution module comprises a batch normalization layer, a ReLU nonlinear activation layer, a 3x3 convolution layer and a 7x7 depth convolution layer; the first 3x3 convolution layer is responsible for extracting local features, the number of feature channels is doubled, the second 3x3 convolution layer is responsible for enhancing the extraction of the local features, and the 7x7 depth convolution layer captures global features and increases the number of channels of the features by using large-kernel convolution; the large-core residual error connection convolution module is applied to the down-sampling process of the model: under the operation of two different convolutions of 3x3 and 7x7, the extracted global information and the local information are fused in an addition mode, so that the module can capture local and global features at the same time, and the limitation problem of CNN long-distance dependency modeling is effectively relieved;
S502, constructing downsampling according to the step S501, wherein the downsampling comprises a 3x3 convolution layer, a large kernel residual error connection convolution module and a maximum pooling layer, wherein the 3x3 convolution layer carries out channel weft lifting on initial input data of a network, the large kernel residual error connection convolution module carries out feature extraction, and the maximum pooling layer reduces feature resolution; the resolution of the features is reduced to half of the original resolution per 1 maximum pooling layer;
s503, constructing a residual connection convolution module, wherein the residual connection convolution module consists of a batch normalization layer, a ReLU nonlinear activation layer and a 3x3 convolution layer; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, a 3x3 convolution layer connected with the residual is responsible for extracting the features and reducing the number of feature channels, in the other two 3x3 convolution layers, the first 3x3 convolution layer is responsible for extracting the features and reducing the number of feature channels to half of the original number, and the second convolution layer is responsible for enhancing the feature extraction; the residual connection convolution module is applied to the up-sampling process of the model, a 3x3 convolution layer is used for extracting features, residual connection convolution and information extracted by conventional convolution are fused in an adding mode, and the segmentation performance of the model is improved;
S504, constructing an adjacent layer feature fusion method, and carrying out adjacent layer feature fusion on the extracted features in the downsampling process of the step S502; when there are 3 adjacent layer features fused, it can be expressed as:
when there are 2 adjacent layer features fused, it can be expressed as:
wherein , and />Representing the fused features->Representing a feature map, wherein a subscript s represents a current scale, and a superscript (h, w, c) represents resolution and channel number at a corresponding scale; conv 2×2 (. Cndot.) represents a 2x2 convolutional layer with a step size of 2, the number of output channels being twice the number of input channels; upConv 2×2 (. Cndot.) represents a 2x2 transposed convolutional layer with a step size of 2, the number of output channels being half the number of input channels, conv 2×2 (. Cndot.) and upConv 2×2 (. Cndot.) downsampling and upsampling adjacent layer features, respectivelyThe same size as the current scale; />
Is an operation of connecting different features;
s505, constructing a GRN module, wherein GRN comprises three steps: global feature aggregation, feature normalization and feature calibration;
for a feature X of input size (H, W, C), it can be expressed as X ε R H×W×C C is the number of characteristic channels, and then:
1) Global feature aggregation
In the global feature aggregation process, the spatial features are aggregated into a vector through a g function, which can be expressed as:
Wherein the above equation, by using the L2 norm, can yield one value for each channel feature, resulting in a set of aggregated values: in the formula ,/>Is a scalar that aggregates the statistics of the ith channel; let X be a feature of n dimensions, i.e. x= (X) 1 ,x 2 ,x 3 ,...x n ) The L2 norm may be expressed as +.>
2) Feature normalization
In the feature normalization process, scalar normalization of statistical information of the ith channel can be expressed as:
wherein ,||Xi I is the L2 norm of the i-th channel,representing the current number of channels;
3) Feature calibration
In the feature calibration process, the feature normalized score calculated in step 2) of step S505 is used to calibrate the raw input response, which can be expressed as:
wherein ,Xi The i-th feature map is shown as such, and />Representing global feature aggregation and feature normalization respectively,represents the current X i Resolution size of (2);
two additional learnable parameters γ and β are added and initialized to zero, and a residual connection is additionally added between the input and output of the GRN layer, the final GRN can be expressed as:
s506, constructing a large-core GRN channel response module according to the step S505; the module consists of a layer normalization layer, a 7x7 depth convolution layer, a GRN layer and a 3x3 convolution layer; the 7x7 depth convolution layer is responsible for feature extraction of features obtained by feature fusion of adjacent layers in the step S504, the GRN is responsible for global response normalization of extracted feature channels, and the 3x3 convolution layer is responsible for feature extraction and feature channel number reduction;
S507, constructing up-sampling based on the step S506 and the step S503, wherein the up-sampling is composed of a residual error connection convolution module and a 2x2 transposed convolution layer 1x1 convolution layer; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, then extracts the features, the 2x2 transposition convolution layer is responsible for increasing the resolution of the features, the resolution of the features is increased to half of the original resolution through 1 2x2 transposition convolution layers, and the 1x1 convolution layer is responsible for mapping into a final segmentation result;
s508, according to the steps S502, S504, S506 and S507, finally forming the adjacent layer feature fusion Unet multi-organ segmentation network combined with the large-kernel depth convolution: ASF-LKUNet.
The specific method of the step S6 is as follows:
s601, constructing a loss function of the method by using a cross entropy loss function and a Dice loss function;
in the process of training ASF-LKUNet segmentation network, the loss function uses cross entropy loss functionAnd the Dice loss function->The definition is as follows:
wherein y represents a label and wherein,a predicted value representing each category; i represents a pixel in the feature map, and c represents a category;
in the training process, an Adam (adaptive moment estimation) optimizer is adopted; the loss function J (theta) is used for solving the bias guide of the (theta), The parameter θ is updated in the negative gradient direction, +.>θ' is the updated network parameter, θ j For the pre-update network parameters, σ is the learning rate, < ->To input training data of the network, h θ (x i ) For the weight of the training set, y i For the labels corresponding to the training set, m is the number of samples input in each training, randomly extracting a group of samples from the training set, and updating according to a gradient descent rule after each training;
s602, training a model by using the training set in the data set constructed in the step S1, the data obtained in the step S3 and the data obtained by enhancing the data in the step S4, and selecting the model with the highest evaluation index in the training process; the evaluation index uses a Dice coefficient (DSC), the DSC generally measures the similarity of two samples, the value range is [0,1], and the higher the DSC value is, the higher the similarity of the two samples is, which is defined as:
wherein , wherein The network segmentation feature map is represented, and Y represents the label feature map.
The segmentation system based on the adjacent layer feature fusion Unet multi-organ segmentation method combined with large-kernel convolution comprises the following components:
the large-core residual error is connected with a convolution encoder and is used for inputting the slice B through a network and extracting characteristic global information and local information;
the residual is connected with a convolution decoder and is used for outputting a segmentation result graph and extracting multi-resolution depth features;
The adjacent layer feature fusion module is used for fusing the features of the adjacent layers and can obtain low-layer features with more details and high-layer features with more semantics;
the large-core GRN channel response module is used for carrying out global response normalization on the fused characteristic channels to enhance channel selection, and enhancing global and local information extraction through large-core depth convolution, so that the model can fully utilize different characteristics and improve the capability of capturing global and local information of the model.
The segmentation equipment based on the adjacent layer feature fusion Unet multi-organ segmentation method combined with large-kernel convolution comprises the following components:
a memory for storing a computer program, data and a model;
and the processor is used for realizing the operation of the landslide identification method based on the evolution pruning lightweight convolutional neural network in any one of the steps 1 to 7 when the computer program is executed.
A computer readable storage medium, which is responsible for reading and storing programs and data, wherein the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program can segment organ images based on the adjacent layer feature fusion uiet multi-organ segmentation method combining large-kernel convolution in steps S1 to S7.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a large-kernel residual connection convolution method (LK Residual Block), wherein residual connection can promote training, alleviate degradation and alleviate the problem of overfitting, particularly for medical images with limited marked samples, large-kernel depth convolution is used at a residual link part, and can be combined with large-kernel depth convolution to capture local and global information at the same time, so that the limitation problem of CNN long-distance dependency modeling can be effectively relieved, and the large-kernel depth convolution can have ViT capacity of capturing global information, and compared with ViT frameworks, parameters are fewer, and less marked data and calculation resources are needed. The LK Residual Block method achieves better segmentation results than conventional residual connection methods, such as ResUnet.
2. Aiming at the problems that the full-connection feature fusion method can cause high computational complexity of a model, the fused features cannot be effectively utilized and the exploration is not deep enough, the adjacent layer feature fusion and large-core Global Response Normalization (GRN) channel response method (LKGRN) is provided. In the adjacent layer feature fusion method, different from the full-connection feature fusion method, the adjacent layer feature fusion method fuses adjacent features in series, so that the computational complexity can be effectively reduced, and the low-layer features with more details and the high-layer features with more semantics can be fused, so that the segmentation performance is improved. The LKGRN method adaptively selects more meaningful channel information for fused features and enhances inter-channel feature extraction by large kernel depth convolution channel response based on GRN improvement. The GRN can increase the contrast and selectivity of channels, explore the relation among the channels, effectively utilize and pay attention to the fused characteristics, generate no additional parameters, further reduce the complexity and effectively relieve the local attention problem by using large-kernel depth convolution in the LKGRN method, and prove the superior performance of the LKGRN method on a multi-organ data set.
3. The method realizes good segmentation performance with lower complexity and parameter quantity, and can effectively and efficiently segment medical images by utilizing the large-core residual error connection, the adjacent layer feature fusion and the large-core GRN channel response method.
In conclusion, the invention has the advantages of few use parameters, reduced complexity of organ segmentation, good practicality and high efficiency.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is an overall structure diagram of ASF-LKUNet of the present invention.
FIG. 3 is a block diagram of a large kernel residual join convolution of the present invention.
Fig. 4 is a block diagram of a residual join convolution of the present invention.
FIG. 5 is a block diagram of a large core GRN channel response module of the invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples.
The adjacent layer feature fusion Unet multi-organ segmentation method combining large-core convolution is characterized in that after data set division, data preprocessing, data sampling and data enhancement are carried out on a sample containing a tag, an adjacent layer feature fusion UNet multi-organ segmentation model combining large-core depth convolution is constructed, and an encoder network, a base network of a decoder, an adjacent layer feature fusion module and a large-core GRN channel response module are constructed; adding large-core depth convolution in an encoder network, combining the large-core depth convolution with 3×3 convolution, and combining local information and global information; constructing adjacent layer feature fusion modules, so that the model fully utilizes information among different layer features to obtain lower layer features with more details and higher layer features with more semantics; and constructing a large-core GRN channel response module, modeling long-distance dependency on the characteristics fused by the characteristics of adjacent layers, and under the condition that the number of characteristic channels is increased, carrying out global response normalization on the channels fused with the characteristics by the large-core GRN channel response module, so that the channels are compared and selected, and the segmentation performance of the whole model is improved by utilizing the fused characteristics.
Referring to fig. 1, a method for segmenting multiple organs by combining large-kernel convolution and feature fusion of adjacent layers specifically comprises the following steps:
s1, data set division
Randomly dividing a sample containing a label in the image data set into a training set and a testing set;
s2, data preprocessing
Resampling the influence data after the data set is divided to eliminate the difference between images with different sources, and facilitate the calculation and comparison of the characteristics in the images: resampling 3D CT data to the same resolution of 1 x 3mm 3 Wherein image sample sampling uses bilinear interpolation and label sample uses adjacent interpolation;
the invention mainly aims at the segmentation of human organs such as aorta, gall bladder, left kidney, right, liver, pancreas, spleen and stomach, so the CT value range is selected to be [ -125,275], in addition, the data are normalized, the adverse effect caused by singular sample data is eliminated, and the convergence rate of network training is accelerated, and the formula is as follows:
wherein R represents CT data after normalization processing,wr and Hr represent the width and height of resolution after the CT data are subjected to normalization processing, zr represents the number of slices, I represents the CT value before the normalization processing, max (I) is the maximum CT value, and min (I) is the minimum CT value; max (I) is denoted as 275 and min (I) is-125.
Centering on the target slice and stacking up and down slices as network inputs: firstly, selecting a target slice on the z axis of R after normalization processing, stacking adjacent slices by taking the target slice as a center, and taking the size of Wr multiplied by Hr multiplied by s as the input of a network, wherein s represents the number of adjacent slice stacks, if the number of stacked slices is insufficient, carrying out corresponding mirror filling, wherein the whole process is as follows:
assuming i represents a slice at the mid-z-axis position of the 3D sample, the network input can be expressed as:
X=[i-1,i,i+1]
wherein X represents the input to the network,wr and Hr represent the width and height of the original resolution of the CT data, when i=1, i.e. i is the first slice starting at the z-axis position in the 3D sample, x= [1, i+1]The method comprises the steps of carrying out a first treatment on the surface of the When i=i max I.e. i at the last slice in the 3D sample at the z-axis position, x= [ i-1, i max ,i max ];
S3, data sampling
Because the sample resolutions of different CT data are inconsistent, and the input of the network needs fixed resolution, the input size of the network and the training time of the network are in direct proportion. Therefore, in order to reduce the network training time and ensure that all contents in each data sample participate in training, the invention takes the size of 256X256X3 as network input and samples the data X obtained by processing in the step S2, namely traverses X and sequentially samples out slices with the size of 256X256X 3; the sequential sampling mode is to take slices Step length is S, and sampling is sequentially performed on X from left to right and from top to bottom, wherein +.>H and W denote 256, if sliced +.>Which is outside the X size range during sampling, then +.>To withdraw samples, if->If H and W are greater than Wr and Hr of X, filling the mixture around the X as a center, wherein the concrete process is as follows:
s301, firstly, calculating filling lengths of X in X and y axes, wherein the filling lengths are as follows:
P x =(S x -((H r -H)modS x )modS x
P y =(S y -((W r -W)modS y )modS y
wherein ,Px For a fill length X in the X-axis, P y Fill length for X in y-axis, S x and Sy Representing the step sizes in the x and y directions during sampling, the step size S of the present invention x and Sy Are all 128, mod table takes remainder, hr and Wr represent CT dataWidth and height of original resolution, W and H denote slicesIn the present invention, H and W are 256, if +.>Hr and Wr of (A) are greater than X, then P is filled in the X and Y axes of X, respectively x and Py If H is greater than Hr, then filling P in X of X x If W is greater than Wr, then filling P in y of X y
S302, calculating the number of slices in the directions of the x axis and the y axis according to the step S301;
let the number of slices in the x-direction be N x The calculation method can be expressed as:
N x =(H r +P x -H)|S x +1
wherein, i represents integer division;
number of slices in y-direction N y The calculation method can be expressed as:
N y =(H r +P y -H)|S y +1
s303, according to And S calculates the coordinates, then the coordinates x' in the x direction can be expressed as:
x′=[x′ 1 ,x′ 2 ,...,x′ i ,....,x′ n ]
wherein ,x′i =(i-1)*S x I=1,..n, N represents the number of slices N in the x-direction x When i=n, x' n =Wr-W;
The coordinate y' in the y direction can be expressed as:
y′=[y′ 1 ,y′ 2 ,...,y′ i ,....,y′ n ]
wherein y′i =(i-1)*S y I=1,..n, N represents the number of slices N in the y direction y When i=n, y' n =Hr-H;
S304, slicingStep S, sequentially sampling from left to right and from top to bottom in the xy direction of X to obtain a slice B, wherein the specific process is as follows:
according to the coordinates of X 'and y' of step S302, the upper left corner coordinate v to be sampled by X is obtained:
wherein ,n=Nx *N y
Positioned on X according to the coordinates v and slicedThe size of (2) is truncated to obtain a sampling slice B, and the formula is as follows:
B=[B 1 ,B 2 ,...,B n ]
wherein n=Nx *N y ,B i =X(v i ),B i Indicating that the sampling results in the i-th slice,X(v i ) Expressed in X by the coordinate v i Positioning and then slicing->The size of (1) is intercepted at X to obtain a sampling slice;
s4, data enhancement
Because the data samples are rare, the training data is required to be expanded by data augmentation, the phenomenon of overfitting caused by training with fewer samples is avoided, and the data augmentation is realized by carrying out horizontal overturning, vertical overturning, -90-degree and 90-degree rotation and horizontal left-right translation on the data processed in the step S3 according to probability;
S5, constructing a multi-organ segmentation network
And constructing an adjacent layer feature fusion Unet multi-organ segmentation network combined with large-kernel depth convolution, and naming the network as ASF-LKUNet. Please refer to fig. 2;
s501, constructing a large-core residual error connection convolution module, wherein the module consists of 2 batch normalization layers, 2 ReLU nonlinear activation layers, 2 3x3 convolution layers and 1 7x7 depth convolution layers. Please refer to fig. 3. The first 3x3 convolution layer is responsible for extracting local features, the number of feature channels is doubled, the second convolution layer is responsible for enhancing the local feature extraction, and the 7x7 depth convolution layer captures global features by utilizing large-kernel convolution and increases the number of channels of the features. The module is applied to a down sampling process of a model, and the extracted global information and local information are fused in an addition mode under the operation of two different convolutions of 3x3 and 7x7, so that the module can capture local and global characteristics at the same time, the limitation problem of CNN long-distance dependency modeling is effectively relieved, and a 7x7 depth convolution layer has fewer parameters and needs less marking data and calculation resources compared with a segmentation model of a ViT framework;
s502, constructing downsampling according to S501, wherein the downsampling comprises 1 3x3 convolution layers, 4 large-core residual error connection convolution modules and 4 maximum pooling layers of 2x 2; the 3x3 convolution layer is responsible for carrying out channel weft lifting on network initial input data, the large-core residual error connection convolution module is responsible for extracting features, the maximum pooling layer is responsible for reducing the resolution of features, and the resolution of features is reduced to half of the original resolution after passing through 1 maximum pooling layer;
S503, constructing a residual connection convolution module, wherein the residual connection convolution module consists of 2 batch normalization layers, 2 ReLU nonlinear activation layers and 3x3 convolution layers; please refer to fig. 4; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, a 3x3 convolution layer connected with the residual is responsible for extracting the features and reducing the number of feature channels, in the other two 3x3 convolution layers, the first 3x3 convolution layer is responsible for extracting the features and reducing the number of feature channels to half of the original number, and the second convolution layer is responsible for enhancing the feature extraction; the residual connection convolution module is used for extracting features by using 3x3 convolution layers in the up-sampling process of the model, and the residual connection convolution and information extracted by conventional convolution are fused in an addition mode, so that the segmentation performance of the model is improved;
s504, constructing an adjacent layer feature fusion method, and carrying out adjacent layer feature fusion on the extracted features in the downsampling process of the step S502. When there are 3 row-adjacent layer features fused, it can be expressed as:
when there are 2 adjacent layer features fused, it can be expressed as:
wherein , and />Representing the fused features->The feature map is represented, the subscript s represents the current scale, and the superscript (h, w, c) represents the resolution and channel number at the corresponding scale. Conv 2×2 (. Cndot.) represents a 2x2 convolutional layer with a step size of 2, and the number of output channels is twice the number of input channels. upConv 2×2 (. Cndot.) represents a 2x2 transposed convolutional layer with a step size of 2, and the number of output channels is half that of input channels. Conv 2×2 (. Cndot.) and upConv 2×2 (. Cndot.) downsampling and upsampling adjacent layer features to the same size as the current scale, respectively; />Is an operation of connecting different features;
s505, constructing a GRN module, wherein GRN comprises three steps: global feature aggregation, feature normalization and feature calibration.
For a feature X of input size (H, W, C), it can be expressed as X ε R H×W×C The following steps are:
1) Global feature aggregation
In the global feature aggregation process, the spatial features are aggregated into a vector through a g function, which can be expressed as:
/>
wherein the above equation, by using the L2 norm, can yield one value for each channel feature, resulting in a set of aggregated values: in the formula ,/>Is a scalar that aggregates the statistics of the ith channel; let X be a feature of n dimensions, i.e. x= (X) 1 ,x 2 ,x 3 ,...x n ) The L2 norm may be expressed as,
2) Feature normalization
In the feature normalization process, scalar normalization of statistical information of the ith channel can be expressed as:
wherein ,||Xi I is the L2 norm of the i-th channel, Representing the current number of channels;
3) Feature calibration
In the feature calibration process, the feature normalized score calculated in step 2) of step S505 is used to calibrate the raw input response, which can be expressed as:
wherein ,Xi The i-th feature map is shown as such, and />Representing global feature aggregation and feature normalization respectively,represents the current X i Resolution size of (2);
to simplify the optimization, two additional learnable parameters γ and β need to be added and initialized to zero, and a residual connection is additionally added between the input and output of the GRN layer, the final GRN can be expressed as:
s506, constructing a large-core GRN channel response module according to the step S505. The module consists of a layer normalization layer, a 7x7 depth convolution layer and a GRN layer 3x3 convolution layer; please refer to fig. 5. The 7x7 depth convolution layer is responsible for feature extraction of features obtained by feature fusion of adjacent layers in the step S504, the GRN is responsible for global response normalization of extracted feature channels, and the 3x3 convolution layer is responsible for feature extraction and feature channel number reduction; therefore, in the large-core convolution GRN channel response module, the features of adjacent layers are fused, low-level details are combined with high-level semantics, global and local information extraction and channel selection are enhanced through GRN-based improved large-core channel response, and therefore the model can fully utilize different features, and the capability of capturing global and local information of the model is improved.
S507, constructing up-sampling based on the step S506 and the step S503, wherein the up-sampling consists of 4 residual error connection convolution modules, 4 2x2 transpose convolution layers and 1x1 convolution layer; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, then extracts the features, the 2x2 transposition convolution layer is responsible for increasing the resolution of the features, the resolution of the features is increased to half of the original resolution through 1 2x2 transposition convolution layers, and the 1x1 convolution layer is responsible for mapping into a final segmentation result;
according to step S502, step S504, step S506 and step S507, finally forming a neighboring layer feature fusion Unet multi-organ segmentation network combined with large-kernel depth convolution: ASF-LKUNet; please refer to fig. 2.
S6, training an ASF-LKUNet segmentation network constructed in the step S5;
s601, constructing a loss function of the method by using a cross entropy loss function and a Dice loss function;
in training the segmenter, the loss function uses a cross entropy loss functionAnd the Dice loss function->The definition is as follows:
wherein y represents a label and wherein,a predicted value representing each category; i represents a pixel in the feature map, and c represents a category;
in the training process, an Adam (adaptive moment estimation) optimizer is adopted; first, the loss function J (theta) is used for solving the bias derivative of the (theta), The parameter θ is updated in the negative gradient direction, +.>θ' is the updated network parameter, θ j For the pre-update network parameters, σ is the learning rate, < ->To input training data of the network, h θ (x i ) For the weight of the training set, y i For the labels corresponding to the training set, m is the number of samples input in each training, randomly extracting a group of samples from the training set, and updating according to a gradient descent rule after each training;
s602, training a model by using the training set in the data set constructed in the step S1, the data obtained in the step S3 and the data obtained by enhancing the data in the step S4, and selecting the model with the highest evaluation index in the training process; the evaluation index uses a Dice coefficient (DSC), the DSC generally measures the similarity of two samples, the value range is [0,1], and the higher the DSC value is, the higher the similarity of the two samples is, which is defined as:
wherein , wherein Representing a network segmentation feature map, and Y represents a label feature map;
s7, testing the optimal model obtained by training in the step S7 by using the test set in the step S1 and the data obtained by processing in the step S3, and quantitatively evaluating the segmentation performance of the model by using a DSC coefficient and a Haoskov distance (HD 95); the HD95 calculates the distance between the two sets, and the smaller the value, the higher the similarity between the two sets; the calculation mode is as follows:
wherein ,for tag feature map Y and partition feature map +.>Unidirectional Haosdorf distance between ++>For segmenting feature map->And the unidirectional Haoskov distance between the label characteristic diagram Y, and max (DEG) is calculated as Y and +>The distances among the boundary points are ranked from small to large, and the distances ranked at 95% are taken;
the higher the DSC coefficient and the lower the HD95, the better the representative model segmentation performance, and thus the segmentation performance of the model is comprehensively evaluated.
The invention provides a large-kernel residual connection convolution method (LK Residual Block), wherein residual connection can promote training, alleviate degradation and alleviate the problem of overfitting, particularly for medical images with limited marked samples, large-kernel depth convolution is used at a residual link part, local and global information can be captured simultaneously by combining large-kernel depth convolution, the limitation problem of CNN long-distance dependency modeling can be effectively relieved, and the large-kernel depth convolution can have ViT capacity of capturing global information, and compared with ViT frameworks, parameters are fewer, and less marked data and calculation resources are required. The LK Residual Block method achieves better segmentation results than conventional residual connection methods, such as ResUnet.
The invention also aims at the problems that the full-connection fusion characteristic method can cause high computational complexity of a model, the fused characteristic cannot be effectively utilized and the exploration is not deep enough, and provides an adjacent layer characteristic fusion and large core Global Response Normalization (GRN) channel response method (LKGRN). In the adjacent layer feature fusion method, different from the full-connection feature fusion method, the adjacent layer feature fusion method fuses adjacent features in series, so that the computational complexity can be effectively reduced, and the low-layer features with more details and the high-layer features with more semantics can be fused, so that the segmentation performance is improved. The LKGRN method adaptively selects more meaningful channel information for fused features and enhances inter-channel feature extraction by large kernel depth convolution channel response based on GRN improvement. The GRN can increase the contrast and selectivity of channels, explore the relation among the channels, effectively utilize and pay attention to the fused characteristics, generate no additional parameters, further reduce the complexity and effectively relieve the local attention problem by using large-kernel depth convolution in the LKGRN method, and prove the superior performance of the LKGRN method on a multi-organ data set.

Claims (9)

1. The adjacent layer feature fusion Unet multi-organ segmentation method combining large-kernel convolution is characterized in that after data set division, data preprocessing, data sampling and data enhancement are carried out on a sample containing a tag, an adjacent layer feature fusion UNet multi-organ segmentation model combining large-kernel depth convolution is constructed, and an adjacent layer feature fusion module is constructed, so that the model fully utilizes information among different layer features to obtain lower layer features with more details and higher layer features with more semantics; and constructing a large-core GRN channel response module, modeling long-distance dependency on the characteristics fused by the characteristics of adjacent layers, and under the condition that the number of characteristic channels is increased, carrying out global response normalization on the channels fused with the characteristics by the large-core GRN channel response module, so that the channels are compared and selected, and the segmentation performance of the whole model is improved by utilizing the fused characteristics.
2. The method for constructing a large-kernel-convolution-combined adjacent-layer feature fusion Unet multi-organ segmentation model according to claim 1, wherein the constructing the large-kernel-convolution-combined adjacent-layer feature fusion Unet multi-organ segmentation model comprises: constructing an encoder network, a base network of a decoder, an adjacent layer feature fusion module and a large core GRN channel response module; adding large-core depth convolution in the encoder network, combining the large-core depth convolution with 3×3 convolution, and combining local information and global information.
3. The method for segmenting the multiple organs by combining large-kernel convolution and fusion of adjacent layer features according to claim 1 or 2 is characterized by comprising the following steps:
s1, data set division
Randomly dividing a sample containing a label in the image data set into a training set and a testing set;
s2, data preprocessing
Resampling the influence data after the data set is divided to eliminate the difference between images with different sources, and facilitate the calculation and comparison of the characteristics in the images: resampling the 3D CT data to the same resolution, wherein image sample sampling uses bilinear interpolation and label sample uses adjacent interpolation;
the data is normalized, adverse effects caused by singular sample data are eliminated, and the convergence speed of network training is accelerated, and the formula is as follows:
wherein R represents CT data after normalization processing,wr and Hr represent the width and height of resolution after the CT data are subjected to normalization processing, zr represents the number of slices, I represents the CT value before the normalization processing, max (I) is the maximum CT value, and min (I) is the minimum CT value;
centering on the target slice and stacking up and down slices as network inputs: firstly, selecting a target slice on the z axis of R after normalization processing, stacking adjacent slices by taking the target slice as a center, and taking the size of Wr multiplied by Hr multiplied by s as the input of a network, wherein s represents the number of adjacent slice stacks, if the number of stacked slices is insufficient, carrying out corresponding mirror filling, wherein the whole process is as follows:
Assuming that i represents a slice at the z-axis position in R after normalization, the network input can be expressed as:
X=[i-1,i,i+1]
wherein X represents the input to the network,wr and Hr represent the wide-high of the original resolution of the CT data, when i=1, i.e. i is the first slice starting at the z-axis position in R, x= [1, i+1]The method comprises the steps of carrying out a first treatment on the surface of the When i=i max I.e. i is the last slice in z-axis position in R, x= [ i-1, i max ,i max ];
S3, data sampling
Sampling the data X obtained by the processing in the step S2, namely traversing X and sequentially sampling; the sequential sampling mode is to take slicesStep length is S, and sampling is sequentially performed on X from left to right and from top to bottom, wherein +.>H and W denote the width and height of the slice, respectively, if the slice +.>Which is outside the X size range during sampling, then +.>To withdraw samples, if->If H and W are greater than Wr and Hr of X, filling the mixture around X;
s4, data enhancement
Performing data augmentation to expand training data, avoiding the phenomenon of overfitting caused by training with fewer samples, and performing horizontal overturn, vertical overturn, -90 degrees and 90 degrees rotation and horizontal left-right translation on the data processed in the step S3 according to probability to realize data augmentation by data augmentation;
s5, constructing an adjacent layer feature fusion Unet multi-organ segmentation network combined with large-kernel depth convolution, and naming the network as ASF-LKUNet;
S6, training an ASF-LKUNet segmentation network constructed in the step S5;
s7, testing the optimal model obtained by training in the step S7 by using the test set in the step S1 and the data obtained by processing in the step S3, and quantitatively evaluating the segmentation performance of the model by using a DSC coefficient and a Haoskov distance (HD 95); wherein HD95 is a calculation of the distance between two sets, the smaller the value, the smaller the distance representing the two sets; the calculation mode is as follows:
wherein ,for tag feature map Y and partition feature map +.>Unidirectional Haosdorf distance between ++>To divide feature picturesAnd the unidirectional Haoskov distance between the label characteristic diagram Y, and max (DEG) is calculated as Y and +>The distance between boundary points is sorted from small to large, and the sorting distance before 95% of sorting is taken;
the higher the DSC coefficient and the lower the HD95, the better the representative model segmentation performance, and thus the segmentation performance of the model is comprehensively evaluated.
4. The method for segmenting multiple organs by combining large-kernel convolution and fusion of adjacent layer features as set forth in claim 3, wherein the specific process of data sampling in the step S3 is as follows:
s301, firstly, calculating filling lengths of X in X and y axes, wherein the filling lengths are as follows:
P x =(S x -((H r -H)modS x )modS x
P y =(S y -((W r -W)modS y )modS y
wherein ,Px For a fill length X in the X-axis, P y Fill length for X in y-axis, S x and Sy Representing the step sizes in the x and y directions during sampling, S x and Sy Taking the same step length, mod table taking remainder, hr and Wr representing the width and height of the original resolution of CT data, W and H representing the sliceIs wide and tall, if->Hr and Wr of (A) are greater than X, then P is filled in the X and Y axes of X, respectively x and Py If H is greater than Hr, then filling P in X of X x If W is greater than Wr, then filling P in y of X y
S302, calculating the number of slices in the directions of the x axis and the y axis according to the step S301;
let the number of slices in the x-direction be N x The calculation method can be expressed as:
N x =(H r +P x -H)|S x +1
wherein, i represents integer division;
number of slices in y-direction N y The calculation method can be expressed as:
N y =(H r +P y -H)|S y +1
s303, according toAnd S calculates the coordinates, then the coordinates x' in the x direction can be expressed as:
x′=[x′ 1 ,x′ 2 ,...,x′ i ,....,x′ n ]
wherein ,x′i =(i-1)*S x ,i=1..n, N represents the number of slices N in the x-direction x When i=n, x' n =Wr-W;
The coordinate y' in the y direction can be expressed as:
y′=[y′ 1 ,y′ 2 ,...,y′ i ,....,y′ n ]
wherein y′i =(i-1)*S y I=1,..n, N represents the number of slices N in the y direction y When i=n, y' n =Hr-H;
S304, slicingStep S, sequentially sampling from left to right and from top to bottom in the xy direction of X to obtain a slice B, wherein the specific process is as follows:
From the coordinates of X 'and y' of step S303, the upper left corner coordinates v to be sampled by X are obtained:
wherein ,n=Nx *N y
Positioned on X according to the coordinates v and slicedThe size of (2) is truncated to obtain a sampling slice B, and the formula is as follows:
B=[B 1 ,B 2 ,...,B n ]
wherein n=Nx *N y ,B i =X(v i ),B i Indicating that the sampling results in the i-th slice,X(v i ) Expressed in X by the coordinate v i Positioning and then slicing->The size of (1) is truncated at X to obtain a sample slice.
5. The method for segmenting multiple organs by combining large-kernel convolution and fusion of adjacent layer features according to claim 3, wherein the specific method of S5 is as follows:
s501, constructing a large-core residual error connection convolution module, which comprises the following steps: batch normalization layer, reLU nonlinear activation layer, 3x3 convolution layer, and 7x7 depth convolution layer; the first 3x3 convolution layer is responsible for extracting local features, the number of feature channels is doubled, the second 3x3 convolution layer is responsible for enhancing the extraction of the local features, and the 7x7 depth convolution layer captures global features and increases the number of channels of the features by using large-kernel convolution; the large-core residual error connection convolution module is applied to the down-sampling process of the model: under the operation of two different convolutions of 3x3 and 7x7, the extracted global information and the local information are fused in an addition mode, so that the module can capture local and global features at the same time, and the limitation problem of CNN long-distance dependency modeling is effectively relieved;
S502, constructing downsampling according to the step S501, wherein the downsampling comprises a 3x3 convolution layer, a large kernel residual error connection convolution module and a maximum pooling layer, wherein the 3x3 convolution layer carries out channel weft lifting on initial input data of a network, the large kernel residual error connection convolution module carries out feature extraction, and the maximum pooling layer reduces feature resolution; the resolution of the features is reduced to half of the original resolution per 1 maximum pooling layer;
s503, constructing a residual connection convolution module, wherein the residual connection convolution module consists of a batch normalization layer, a ReLU nonlinear activation layer and a 3x3 convolution layer; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, a 3x3 convolution layer connected with the residual is responsible for extracting the features and reducing the number of feature channels, in the other two 3x3 convolution layers, the first 3x3 convolution layer is responsible for extracting the features and reducing the number of feature channels to half of the original number, and the second convolution layer is responsible for enhancing the feature extraction; the residual connection convolution module is applied to the up-sampling process of the model, a 3x3 convolution layer is used for extracting features, residual connection convolution and information extracted by conventional convolution are fused in an adding mode, and the segmentation performance of the model is improved;
S504, constructing an adjacent layer feature fusion method, and carrying out adjacent layer feature fusion on the extracted features in the downsampling process of the step S502; when there are 3 adjacent layer features fused, it can be expressed as:
when there are 2 adjacent layer features fused, it can be expressed as:
wherein , and />Representing the fused features->Representing a feature map, wherein a subscript s represents a current scale, and a superscript (h, w, c) represents resolution and channel number at a corresponding scale; conv 2×2 (. Cndot.) represents a 2x2 convolutional layer with a step size of 2, the number of output channels being twice the number of input channels; upConv 2×2 (. Cndot.) represents a 2x2 transposed convolutional layer with a step size of 2, the number of output channels being half the number of input channels, conv 2×2 (. Cndot.) and upConv 2×2 (. Cndot.) downsampling and upsampling adjacent layer features to the same size as the current scale, respectively; (. Cndot.) is the operation of connecting different features;
s505, constructing a GRN module, wherein GRN comprises three steps: global feature aggregation, feature normalization and feature calibration;
for a feature X of input size (H, W, C), it can be expressed as X ε R H×W×C C is the number of characteristic channels, and then:
1) Global feature aggregation
In the global feature aggregation process, the spatial features are aggregated into a vector through a g function, which can be expressed as:
Wherein the above equation, by using the L2 norm, can yield one value for each channel feature, resulting in a set of aggregated values: in the formula ,/>Is a scalar that aggregates the statistics of the ith channel; let X be a feature of n dimensions, i.e. x= (X) 1 ,x 2 ,x 3 ,...x n ) The L2 norm may be expressed as +.>
2) Feature normalization
In the feature normalization process, scalar normalization of statistical information of the ith channel can be expressed as:
wherein ,||Xi I is the L2 norm of the i-th channel,representing the current number of channels;
3) Feature calibration
In the feature calibration process, the feature normalized score calculated in step 2) of step S505 is used to calibrate the raw input response, which can be expressed as:
wherein ,Xi The i-th feature map is shown as such, and />Representing global feature aggregation and feature normalization, respectively, +.>Represents the current X i Resolution size of (2);
two additional learnable parameters γ and β are added and initialized to zero, and a residual connection is additionally added between the input and output of the GRN layer, the final GRN can be expressed as:
s506, constructing a large-core GRN channel response module according to the step S505; the module consists of a layer normalization layer, a 7x7 depth convolution layer, a GRN layer and a 3x3 convolution layer; the 7x7 depth convolution layer is responsible for feature extraction of features obtained by feature fusion of adjacent layers in the step S504, the GRN is responsible for global response normalization of extracted feature channels, and the 3x3 convolution layer is responsible for feature extraction and feature channel number reduction;
S507, constructing up-sampling based on the step S506 and the step S503, wherein the up-sampling is composed of a residual error connection convolution module and a 2x2 transposed convolution layer 1x1 convolution layer; the residual connection convolution module firstly fuses the features extracted by downsampling and the features extracted by upsampling in an additive mode, then extracts the features, the 2x2 transposition convolution layer is responsible for increasing the resolution of the features, the resolution of the features is increased to half of the original resolution through 1 2x2 transposition convolution layers, and the 1x1 convolution layer is responsible for mapping into a final segmentation result;
s508, according to the steps S502, S504, S506 and S507, finally forming the adjacent layer feature fusion Unet multi-organ segmentation network combined with the large-kernel depth convolution: ASF-LKUNet.
6. The method for segmenting multiple organs by combining large-kernel convolution and feature fusion of adjacent layers according to claim 3, wherein the specific method in the step S6 is as follows:
s601, constructing a loss function of the method by using a cross entropy loss function and a Dice loss function;
in the process of training ASF-LKUNet segmentation network, the loss function uses cross entropy loss functionAnd the Dice loss function->The definition is as follows:
wherein y represents a label and wherein,a predicted value representing each category; i represents a pixel in the feature map, and c represents a category;
In the training process, an Adam (adaptive moment estimation) optimizer is adopted; the loss function J (theta) is used for solving the bias guide of the (theta),the parameter θ is updated in the negative gradient direction, +.>θ' is the updated network parameter, θ j For the pre-update network parameters, σ is the learning rate, < ->To input training data of the network, h θ (x i ) For the weight of the training set, y i For the labels corresponding to the training set, m is the number of samples input in each training, randomly extracting a group of samples from the training set, and updating according to a gradient descent rule after each training;
s602, training a model by using the training set in the data set constructed in the step S1, the data obtained in the step S3 and the data obtained by enhancing the data in the step S4, and selecting the model with the highest evaluation index in the training process; the evaluation index uses a Dice coefficient (DSC), the DSC generally measures the similarity of two samples, the value range is [0,1], and the higher the DSC value is, the higher the similarity of the two samples is, which is defined as:
wherein , wherein The network segmentation feature map is represented, and Y represents the label feature map.
7. A segmentation system based on the adjacent layer feature fusion Unet multi-organ segmentation method combined with large-kernel convolution as claimed in any one of claims 1 to 6, comprising:
The large-core residual error is connected with a convolution encoder and is used for inputting the slice B through a network and extracting characteristic global information and local information;
the residual is connected with a convolution decoder and is used for outputting a segmentation result graph and extracting multi-resolution depth features;
the adjacent layer feature fusion module is used for fusing the features of the adjacent layers to obtain low-layer features with more details and high-layer features with more semantics;
the large-core GRN channel response module is used for carrying out global response normalization on the fused characteristic channels to enhance channel selection, and enhancing global and local information extraction through large-core depth convolution, so that the model can fully utilize different characteristics and improve the capability of capturing global and local information of the model.
8. A segmentation apparatus based on the adjacent layer feature fusion Unet multi-organ segmentation method in combination with large-kernel convolution as claimed in any one of claims 1 to 6, comprising:
a memory for storing a computer program, data and a model;
and the processor is used for realizing the operation of the landslide identification method based on the evolution pruning lightweight convolutional neural network in any one of the steps 1 to 7 when the computer program is executed.
9. A computer readable storage medium, which is responsible for reading and storing programs and data, wherein the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program can segment organ images based on the adjacent layer feature fusion uiet multi-organ segmentation method combined with large-kernel convolution as described in step S1 to step S7.
CN202310707529.6A 2023-06-15 2023-06-15 Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution Pending CN116681894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310707529.6A CN116681894A (en) 2023-06-15 2023-06-15 Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310707529.6A CN116681894A (en) 2023-06-15 2023-06-15 Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution

Publications (1)

Publication Number Publication Date
CN116681894A true CN116681894A (en) 2023-09-01

Family

ID=87778926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310707529.6A Pending CN116681894A (en) 2023-06-15 2023-06-15 Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution

Country Status (1)

Country Link
CN (1) CN116681894A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894802A (en) * 2023-09-11 2023-10-17 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium
CN117496516A (en) * 2023-12-25 2024-02-02 北京航空航天大学杭州创新研究院 Brain tumor MRI image segmentation method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894802A (en) * 2023-09-11 2023-10-17 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium
CN116894802B (en) * 2023-09-11 2023-12-15 苏州思谋智能科技有限公司 Image enhancement method, device, computer equipment and storage medium
CN117496516A (en) * 2023-12-25 2024-02-02 北京航空航天大学杭州创新研究院 Brain tumor MRI image segmentation method and system
CN117496516B (en) * 2023-12-25 2024-03-29 北京航空航天大学杭州创新研究院 Brain tumor MRI image segmentation method and system

Similar Documents

Publication Publication Date Title
TWI762860B (en) Method, device, and apparatus for target detection and training target detection network, storage medium
US11887311B2 (en) Method and apparatus for segmenting a medical image, and storage medium
CN109740665B (en) Method and system for detecting ship target with occluded image based on expert knowledge constraint
WO2020078269A1 (en) Method and device for three-dimensional image semantic segmentation, terminal and storage medium
CN110570353A (en) Dense connection generation countermeasure network single image super-resolution reconstruction method
CN116681894A (en) Adjacent layer feature fusion Unet multi-organ segmentation method, system, equipment and medium combining large-kernel convolution
CN109544681B (en) Fruit three-dimensional digitization method based on point cloud
US20230206603A1 (en) High-precision point cloud completion method based on deep learning and device thereof
CN109584156A (en) Micro- sequence image splicing method and device
CN107688783B (en) 3D image detection method and device, electronic equipment and computer readable medium
CN116563265B (en) Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion
CN113591795A (en) Lightweight face detection method and system based on mixed attention feature pyramid structure
CN113313047B (en) Lane line detection method and system based on lane structure prior
CN110322403A (en) A kind of more supervision Image Super-resolution Reconstruction methods based on generation confrontation network
CN113177592B (en) Image segmentation method and device, computer equipment and storage medium
CN112669348A (en) Fish body posture estimation and fish body phenotype data measurement method and device
CN113343822B (en) Light field saliency target detection method based on 3D convolution
CN110648331A (en) Detection method for medical image segmentation, medical image segmentation method and device
CN114627290A (en) Mechanical part image segmentation algorithm based on improved DeepLabV3+ network
CN110211193A (en) Three dimensional CT interlayer image interpolation reparation and super-resolution processing method and device
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN113610178A (en) Inland ship target detection method and device based on video monitoring image
CN111696167A (en) Single image super-resolution reconstruction method guided by self-example learning
CN117437423A (en) Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement
CN113111740A (en) Characteristic weaving method for remote sensing image target detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination