CN116935054A

CN116935054A - Semi-supervised medical image segmentation method based on hybrid-decoupling training

Info

Publication number: CN116935054A
Application number: CN202310958092.3A
Authority: CN
Inventors: 龙建武; 任岩
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-24

Abstract

The application provides a semi-supervised medical image segmentation method based on hybrid-decoupling training, which comprises the steps of designing a disturbance strategy hybrid-decoupling to fully regularize a large amount of unidentified medical image data: firstly, mixing a marked image and an unmarked image from a data layer to obtain a mixed target image, then performing decoupling operation between the mixed target image and output prediction of the marked image from the feature layer to obtain strong version prediction of the unmarked image, and then forcing the strong version prediction of the unmarked image to be consistent with the direct prediction; and designing a classification entropy filtering method to screen reliable pseudo labels for unlabeled images for more effective supervision: based on the theory of information entropy, the segmentation mask corresponding to the direct prediction of the unlabeled image is used as a classification basis, and then the reliable pseudo tag screening is carried out on each category according to the self-adaptive filtering proportion. The method can mine the abundant information in the unlabeled data, and ensure that the model is more robust to the edge sensitive information.

Description

Semi-supervised medical image segmentation method based on hybrid-decoupling training

Technical Field

The application relates to the technical field of medical image segmentation, in particular to a semi-supervised medical image segmentation method based on hybrid-decoupling training.

Background

In the field of medical image segmentation, compared with the traditional segmentation algorithm, the full-supervision segmentation algorithm based on deep learning achieves excellent results by virtue of the driving of large-scale fine labeling data. However, expert-level labeling of medical image data is time-consuming, expensive, and sometimes even not feasible in special scenarios and tasks. To overcome the difficulty of the scarcity of marked data, the semi-supervised segmentation method becomes a typical and mainstream viable solution, which aims to train a segmentation model approaching the performance of the fully supervised method with a small amount of marked data and a large amount of unmarked data. Among them, the semi-supervised segmentation method using consistency regularization and self-training is mainly based on the simple and effective characteristics.

In semi-supervised segmentation, consistency regularization encourages the network to still give consistent predictions of unlabeled inputs under perturbation at the feature, model, and data levels based primarily on smoothness assumptions and clustering assumptions, thereby enhancing the robustness of the network. For example, CCT methods propose cross-consistency training, applying disturbances to the encoder output of multiple unlabeled data, using a shared encoder to constrain the consistency between individual encoders. The GCT further performs network perturbation by using two split networks having the same structure but different initializations, and enforces consistency between predictions of the perturbed networks. ICT and ICT-MedSeg carry out mixed processing on unlabeled data according to the thought of MixUp, and encourages models to match the prediction of processed data with the prediction interpolation of corresponding input. Unlike MixUp and CutOut, cutMix is an intuitive and powerful data processing method due to its Copy-Paste (CP) properties. Some works such as GuidedMix-Net try to apply different CP modes to perform data disturbance processing based on CutMix, and achieve considerable segmentation performance. Unlike the consistency regularization method, the pseudo-label method follows the idea of self-training to infer unlabeled data using a pre-trained model learned on labeled data, forcing class decision boundaries into low density regions by minimizing entropy of unlabeled data. To alleviate the problem of acknowledgement bias, some pseudo tag approaches attempt to preventOver-fitting to incorrect pseudo tags when generating predictions of input images from a teacher network. For example, fixMatch uses confidence thresholds to select reliable pseudo tags for label-free images, and then uses these pseudo tags as supervisory signals for network training. U (U) ² PL presets the proportion of the reliable pseudo labels which gradually rises, then the entropy percentile corresponding to the proportion of the current reliable pseudo labels is set as a confidence threshold in training, and the pseudo labels are filtered. The UPS is based on the idea of FixMatch, and additionally considers the uncertainty of data and the uncertainty of a model, so that the effectiveness of semi-supervised segmentation is further improved.

However, the present inventors have found through studies that the existing method has the following disadvantages: a) The marked data and the unmarked data are respectively processed by different learning norms, so that the transfer of common semantic knowledge from the marked data to the unmarked data is hindered, and experience distribution mismatch between the marked data and the unmarked data is caused; b) The disturbance strategy based on the CP mode still does not fully mine the abundant information in unlabeled data, and the original semantic structure of the target object is broken to a certain extent, so that the sensitive information which is beneficial to segmentation is easily lost by the model; c) In medical image segmentation tasks with extremely unbalanced class problems, since the global entropy filtering-based method screens reliable pseudo labels from all pixels, the correct pseudo labels occupying the class which is smaller and more difficult to segment are filtered out in a large amount, which is catastrophic to the segmentation of the class which originally lacks a true label.

Therefore, aiming at some problems of the above method, how to innovatively design a simple and efficient semi-supervised medical image segmentation method capable of utilizing unlabeled data is important.

Disclosure of Invention

Aiming at the technical problems of the existing medical image segmentation method, the application provides a semi-supervised medical image segmentation method based on hybrid-decoupling training, which can fully utilize a large amount of priori knowledge learned from marked data to promote a model to mine abundant information in unmarked data and ensure that the model is more robust to edge sensitive information.

In order to solve the technical problems, the application adopts the following technical scheme:

a semi-supervised medical image segmentation method based on hybrid-decoupling training comprises the following steps:

s1, designing a high-level multi-aspect disturbance strategy hybrid-decoupling to fully regularize a large amount of unidentified medical image data: in detail, the mixing-decoupling firstly performs pixel-level mixing on the marked image and the unmarked image from the data layer to obtain a mixed target image, then performs decoupling operation between the mixed target image and the output prediction of the marked image from the feature layer to obtain a strong version prediction of the unmarked image, and then forces the strong version prediction of the unmarked image to be consistent with the direct prediction of the unmarked image;

s2, designing a classification entropy filtering method to screen reliable pseudo labels for unlabeled images so as to perform more effective supervision: in detail, based on the theory of information entropy, the segmentation mask corresponding to the direct prediction of the unlabeled image is firstly used as the basis of classification, then each class is subjected to reliable pseudo-label screening according to the self-adaptive filtering proportion, and the strong version prediction of the unlabeled image is further supervised through the reliable pseudo-labels obtained through screening so as to realize finer segmentation.

Further, the step S1 of performing pixel-level blending on the marked image and the unmarked image from the data layer to obtain a blended target image includes:

applying linear interpolation to blend each pixel point of the marked image-unmarked image pair to ensure transfer of potential supervision knowledge from marked data to unmarked data; randomly constructing marker images x among the same training batch ^l And unlabeled image x ^u Hybrid object image x in between ^mix Mix target image x ^mix The method is obtained through a pixel-level linear interpolation mode, and a specific calculation formula is as follows:

x ^mix ＝λx ^l +(1-λ)x ^u

where λ∈ (0, 1) is a super parameter sampled from a Beta (α, α) distribution with α, α is preset to 0.5, and λ=min (λ,1- λ) is set.

Further, performing a decoupling operation between the feature level-facing hybrid target image and the output prediction of the marked image in the step S1 to obtain a strong version prediction of the unmarked image includes:

during the training process, the labeled image-mixed target image pair (x ^l ，x ^mix ) Delivering to student networkObtaining the corresponding segmentation feature map +.>And unmarked image x ^u Send into teacher's network->Obtain segmentation feature map->Since convolution operations have a shift, etc., variability, then F ^mix Can be regarded as F ^l And F is equal to ^u Approximation of direct mixing:

F ^mix ≈F ^l +F ^u

wherein ,F^u For students' networkFor unlabeled image x ^u Then the purpose of performing decoupling is to output from F ^mix Middle knockout F ^l Thereby obtaining F ^u Enhanced version F of (2) ^udec Then using F ^udec Replace F ^u Participating in the calculation of the target equation as a trainable signal; in particular, because the neural network has the ability to separate the corresponding class channels, and the tagged image has a near-true segmentation feature map F by virtue of the true tags ^l Segmentation feature map F directly from the hybrid target image using hard decoupling ^mix F of decoupling to obtain unlabeled image ^udec The specific calculation formula is as follows:

F ^udec ＝F ^mix -F ^l 。

further, the direct subtraction in the hard decoupling approach counteracts the prediction of the target overlap region in the blended target image, which attenuates F by the following equation ^mix F in F) ^l To preserve object detail in the overlapping region:

F ^udec ＝F ^mix -λF ^l

wherein the setting of the super parameter lambda epsilon (0, 1) is consistent with the image interpolation.

Further, forcing the strong version prediction of the unlabeled picture to be consistent with the direct prediction of the unlabeled picture in step S1 includes:

for unlabeled images, a softmax activation function is first used to segment feature map F from the teacher network output ^ut And an enhanced version F obtained by decoupling ^udec Activating to obtain respectively corresponding segmentation prediction probability graphs P ^ut ＝soft max(F ^ut) and P^udec ＝soft max(F ^udec ) The segmentation prediction probability map P ^ut I.e. direct prediction of unlabeled pictures, the segmentation prediction probability map P ^udec Namely, the strong version prediction of the unlabeled image; then, the mean square error MSE and the Dice similarity coefficient loss are adopted to jointly calculate the mixed-decoupled consistency lossTo encourage P ^ut And P ^udec Keep consistent and get on>The calculation formula of (2) is as follows:

wherein ,w_dsc To control and />The weight coefficients of the two loss balances, H and W, represent the height and width of the image, i represents the ith pixel, C represents the number of channels, ε is the loss function +.>Is included in the coefficient of smoothing of (a).

Further, the step S2 specifically includes:

s21, p is taken _i ∈P ^ut Denoted as P ^ut The softmax probability of the ith pixel on each class, the information entropy of the ith pixel is calculated by the following formula:

wherein ,P^ut Representing a teacher network versus unlabeled image x ^u Predictive probability map of p _i (c) Is p _i Probability values on channel c;

s22, adopting a proportion for screening reliable pixelsThe linear adjustment strategy which increases with the increase of the training wheel number t is specifically shown as follows:

wherein ,for the proportion of reliable pixels at training of t-round, +.>T is the total number of rounds required by model training, namely the proportion of the screened reliable pixels is equal to the number of rounds of model training from +.>Gradually increasing to 100%;

s23, define gamma _jt For the entropy threshold value of class j in the t-th training, gamma _jt Take the value from y ^ut Proportion of screening reliable pixels among all pixels of jThe corresponding quantiles, therefore, the reliable pseudo tags y are filtered out respectively according to the following formula ^u ：

wherein ,pseudo tag representing the ith pixel of prediction category j, +.>Information entropy representing the ith pixel of prediction class j;

s24, reliable pseudo tag y ^u Incorporated into the following reliable pixel lossIn the calculation formula:

wherein ,by cross entropy loss->And Dice loss->Two items are formed, w _d To control cross entropy loss->And Dice loss->These two terms lose balanced weight coefficients.

Further, the proportion of the reliable pixels to be screened increases with training, resulting inThe convergence of the model is disadvantageous, then +.>An adaptive weight w is set _rp Adaptive weights w _rp Is defined as the inverse of the reliable pixel percentage multiplied by the basis weight η:

wherein 1 (·) is an index function, and H and W represent the height and width of the image, respectively.

Compared with the prior art, the semi-supervised medical image segmentation method based on the hybrid-decoupling training has the following advantages:

1. in the semi-supervised medical image segmentation method based on mixed-decoupled training (Mixup-Decoupling Training, MDT), a disturbance strategy Mixup-decouping with high level multiple aspects on the data level (mixing) and the feature level (Decoupling) is designed to utilize knowledge learned from the marked data to promote a model to mine rich information in unmarked data, which is the first time the technology is applied to a semi-supervised medical image segmentation task, and the strategy not only helps the model learn robust features in a complex environment, but also encourages the model to focus on outline details and complete semantics of an object;

2. in order to alleviate the problem of confirmation bias and class imbalance, the application further proposes a novel class entropy filtering method (Categorical Entropy Filtering, CEF) which not only retains a large number of correct pseudo labels, but also refines the object edges so that the pseudo labels are closer to real labels (group-trunk), thereby screening the unlabeled images for reliable pseudo labels for more effective supervision;

3. a large number of comparison experiments on a public data set show that the MDT method provided by the application realizes new excellent segmentation performance; in addition, the present MDT method achieves a significant performance improvement over the baseline while not introducing new training parameters and expending additional computational costs.

Drawings

Fig. 1 is a schematic diagram of an overall architecture of a semi-supervised medical image segmentation method based on hybrid-decoupled training provided by the present application.

Fig. 2 is a visual comparison chart of a global entropy filtering and a classified entropy filtering method for filtering reliable pseudo labels provided by the application (a first column and a second column respectively represent real labels corresponding to unlabeled images and pseudo labels from a self-training method, a third column is a pseudo label output by a global entropy filtering mode, and a fourth column is a pseudo label output by a classified entropy filtering mode, wherein the upper left corners of the third column and the fourth column show filtered unreliable pseudo labels).

Fig. 3 is a diagram showing a comparison of a segmentation visual of an ACDC dataset with a labeling ratio of 10% between a semi-supervised medical image segmentation method based on hybrid-decoupled training and other methods according to the present application.

Detailed Description

The application is further described with reference to the following detailed drawings in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the implementation of the application easy to understand.

The overall architecture of the semi-supervised medical image segmentation method (MDT method) based on hybrid-decoupling training is shown in figure 1. First, a mathematical definition of the semi-supervised medical image segmentation problem is given. Given a small number of marked image setsAnd a large number of unlabeled image sets +.> wherein />Is the i-th marker image->Corresponding real labels. The application aims to train a medical image semantic segmentation model with better performance by fully utilizing a large number of unlabeled images under the condition that only a small number of labeled images exist. The MDT method follows the architecture of a mean teacher, which contains two identically structured split networks: student network with trainable parameter θ>And with parameters->Teacher network (I)> wherein />Is the exponential shift plane of thetaThe mean value, teacher network parameters, are updated by the student network by means of an Exponential Moving Average (EMA). For tagged images, it enters the student network and is pre-trained with real tags. For a label-free image, it has two-way output: on one hand, the pseudo tag output is obtained through a teacher network; on the other hand, it gets a robust prediction output through a mix-Decoupling (mix-Decoupling) operation. Then, the two outputs respectively apply consistency and pseudo-label modes to carry out semi-supervised learning of the student network. The pseudo tag mode performs supervised learning by selecting reliable prediction as a pseudo tag through classified entropy filtering CEF. It should be noted that student network->And teacher network->Instead of the segmentation prediction P (which is obtained by activating the feature map F by the activation function softmax, p=soft max (F)), the direct output of (a) is the feature map F with the number of channels being the number of segmentation classes C. In general, MDT wants to use the data set +.>Training out a better +.>The segmentation prediction P can be generated as close to the true mask of x as possible. Specifically, for marked image sets +.>Pre-training a student segmentation network using cross entropy loss and price loss that performs better in medical image segmentation>For unlabeled image sets->Two different ways are used to network the students +.>Further training was performed: 1) Hybrid-decoupling consistency regularization; 2) A reliable pseudo tag method is used.

In order to provide rich and diverse perturbations to mine rich intrinsic information of unlabeled data and attempt to establish semantic links for the labeled data and the unlabeled data, so as to mine more useful information in the unlabeled data by utilizing the labeled data, the application provides a hybrid-decoupling consistency regularization method. Hybrid-decoupling is a high level multi-faceted perturbation strategy that not only mixes marked and unmarked data from the data level, but also performs decoupling operations at the feature level for the prediction of the hybrid objective. Specifically, hybrid-decoupling mainly employs two operations: 1) Mix with the marked-unmarked image pair; 2) The prediction of the blended image is decoupled. A corresponding description will be made next.

The application provides a semi-supervised medical image segmentation method based on hybrid-decoupling training, which comprises the following steps:

s1, designing a high-level multi-aspect disturbance strategy hybrid-decoupling to fully regularize a large amount of unidentified medical image data: in detail, the mixing-decoupling firstly performs pixel-level mixing on the marked image and the unmarked image from the data layer to obtain a mixed target image, then performs decoupling operation between the mixed target image and the output prediction of the marked image from the feature layer to obtain a strong version prediction of the unmarked image, and then forces the strong version prediction of the unmarked image to be consistent with the direct prediction of the unmarked image; the hybrid-decoupling perturbation strategy not only helps the model learn robust features in a complex environment, but also encourages the model to focus on the contour details and complete semantics of the object;

s2, designing a classification entropy filtering method to screen reliable pseudo labels for unlabeled images so as to perform more effective supervision: in detail, based on the theory of information entropy, the segmentation mask corresponding to the direct prediction of the unlabeled image is firstly used as the basis of classification, then each class is subjected to reliable pseudo-label screening according to the self-adaptive filtering proportion, and the strong version prediction of the unlabeled image is further supervised through the reliable pseudo-labels obtained through screening so as to realize finer segmentation. The classification entropy filtering method not only can reserve a large number of correct pseudo labels occupying smaller and harder-to-distinguish categories, but also can refine the edges of objects to enable the pseudo labels to be more similar to real labels.

As a specific embodiment, the step S1 of performing pixel-level blending on the marked image and the unmarked image from the data layer to obtain the blended target image includes:

x ^mix ＝λx ^l +(1-λ)x ^u (1)

Wherein λ ε (0, 1) is the super-parameter sampled from the Beta (α, α) distribution with α, α being preset to 0.5; to be able to retain more unlabeled images x ^u Containing potential information and properly utilizing the marker image x ^l Setting λ=min (λ,1- λ).

As a specific embodiment, the performing a decoupling operation between the feature level and the output prediction of the blended target image and the marked image in the step S1 to obtain a strong version prediction of the unmarked image includes:

a decoupling operation is performed at the feature level for the prediction of the hybrid target image, thereby generating a robust prediction for the unlabeled image. Specifically during training, a labeled image-mixed target image pair (x ^l ，x ^mix ) Delivering to student networkObtain respective pairs ofCorresponding segmentation feature map-> And unmarked image x ^u Send into teacher's network->Obtain segmentation feature map->Since the convolution operation has a shift or the like, the shift operation on the input image can still be detected on the output features with corresponding shifts, and the shift or the like can also be embodied in the mixed data, then F ^mix Can be regarded as F ^l And F is equal to ^u Approximation of direct mixing:

F ^mix ≈F ^l +F ^u (2)

wherein ,F^u For students' networkFor unlabeled image x ^u Then the purpose of performing decoupling is to output from F ^mix Middle knockout F ^l Thereby obtaining F ^u Enhanced version F of (2) ^udec Then using F ^udec Replace F ^u Participating in the calculation of the target equation as a trainable signal; in particular, because the neural network has the ability to separate the corresponding class channels, and the tagged image has a near-true segmentation feature map F by virtue of the true tags ^l The segmentation feature map F of the hybrid target image can be directly derived from the image using hard decoupling ^mix F of decoupling to obtain unlabeled image ^udec The specific calculation formula is as follows:

F ^udec ＝F ^mix -F ^l formula (3).

As a specific example, the inventors of the present application studied to find that the hard decouplingThe direct subtraction approach counteracts the prediction of the target overlap region in the blended target image, which in order to overcome this problem attenuates F by the following equation ^mix F in F) ^l To preserve object detail in the overlapping region:

F ^udec ＝F ^mix -λF ^l (4)

Wherein the setting of the super parameter lambda epsilon (0, 1) is consistent with the image interpolation. This way of decoupling is called soft decoupling, which takes into account the potential overlap problem in the blended image and tends to preserve the local detail of the object.

As a specific embodiment, forcing the strong version prediction of the unlabeled image to be consistent with the direct prediction of the unlabeled image in step S1 includes:

Due to the nature of the data mixing and decoupling approach, and the introduction of a race similarity penalty for the constraint function of consistency that can focus on the region context information of the segmented target, the mixing-decoupling consistency not only helps the model learn more robust features of the object in a complex environment, but also encourages the model to focus on the contour details and complete semantics of the object.

To mitigate the effects of incorrect pseudo tags, global entropy filtering (Global Entropy Filtering, GEF) based methods have been widely used to filter out unreliable pseudo tags or reduce their weight. Specifically, only high confidence predictions are used as pseudo tags, while ambiguous predictions are discarded. However, such methods do not work well due to the problem of unbalanced segmentation classes in medical image segmentation tasks. The reason is that this type of approach looks at each class at the same kernel, setting a screening threshold from the entropy of the overall pixel probability distribution, resulting in a large amount of filtering out the correct pseudo tags for the smaller and more indistinguishable classes. As can be seen from the third column of the picture in fig. 2, these methods based on global entropy filtering largely filter out the correct pseudo tags of RV and Myo categories, even discarding all pseudo tags of category LV, whichSegmentation of categories that are inherently devoid of real tags is catastrophic. Unlike the former method, the present application based on pseudo label method makes the teacher network to separate the unlabeled image into mask y ^ut ＝argmax(P ^ut ) Segmentation prediction P as pseudo-label for constraining unmarked image through hybrid-decoupling ^udec . In addition, a novel class entropy filtering (Categorical Entropy Filtering, CEF) method was devised in view of the problems of class imbalance and validation bias in the pseudo tag approach. In particular, the CEF method simply uses a pseudo tag y ^ut And taking the information entropy of each pixel as a classification basis, and screening the reliable labels of each category respectively.

As a specific embodiment, the specific calculation manner of the classification entropy filtering method is as follows, that is, the step S2 specifically includes:

wherein ,P^ut Representing a teacher network versus unlabeled image x ^u Predictive probability map of p _i (c) Is p _i Probability values on channel c.

S22, taking the fact that the model performance is gradually improved and becomes reliable in the training process into consideration, adopting a proportion for screening reliable pixelsThe linear adjustment strategy which increases with the increase of the training wheel number t is specifically shown as follows:

wherein ,for the proportion of reliable pixels at training of t-round, +.>T is the total number of rounds required by model training, namely the proportion of the screened reliable pixels is equal to the number of rounds of model training from +.>Gradually increasing to 100%.

wherein ,pseudo tag representing the ith pixel of prediction category j, +.>Information entropy representing the ith pixel of prediction class j; assuming that the true label of the pixel i is k, because the teacher network has a certain identification capability, even if the network misuses the pixel i as belonging to the j class, the pixel i has a larger possibility of having higher information entropy, and the screening of reliable pseudo labels is performed according to the entropy filtering proportion, so that the pixel i is more likely to be filtered than the pixel points of the same class, and therefore the pseudo labels y are used ^ut It is reasonable to use itself as a priori information of the classification. As can be seen from the fourth column of the visualized pictures in fig. 2, the classification entropy filtering partyThe method not only retains a large number of correct pseudo tags, but also refines the object edges so that the pseudo tags are closer to real tags.

S24, obtaining the reliable pseudo tag y ^u Thereafter, the authentic pseudo tag y is attached ^u Incorporated into the following reliable pixel lossIn the calculation formula:

wherein ,by cross entropy loss->And Dice loss->Two items are formed, w _d To control cross entropy loss->And Dice loss->The two losses balance the weight coefficients and are set to 0.1.

As a specific embodiment, the proportion of the reliable pixels is increased along with training, resulting inThe convergence of the model is disadvantageous, then +.>An adaptive weight w is set _rp Adaptive weights w _rp Is defined as the inverse of the percentage of reliable pixels multiplied by the baseWeight η:

where 1 (·) is an index function, H and W represent the height and width of the image, respectively, and the setting of η is related to the dataset.

In summary, the optimization objective of the semi-supervised strategy MDT proposed by the present application is to minimize the total loss functionIt can be formulated as:

wherein ,representing a supervision loss function applied to the marker image, in particular +.>By cross entropy loss->And Dice loss->The linear combination is specifically shown as the following formula:

on the other hand, the unsupervised penalty applied to unlabeled images is lost by consistencyAnd reliable pixel lossIs jointly formed; w (w) _c To adjust +.>To mitigate the effects of inaccurate predictions at an early stage in the training process.

The specific arrangement and details of the application are as follows: the present application evaluates the proposed MDT method on the common MRI dataset ACDC 2017 (2D dataset) and LA (3D dataset). ACDC 2017 dataset was derived from 2017, MICCAI ACDC challenge, consisting of 200 short-axis cardiac MRI volumes of 100 patients, with expert annotation of three structures: left Ventricle (LV), myocardium (Myo), and Right Ventricle (RV). The LA dataset was derived from 2018 atrial segmentation challenge and contained 100 gadolinium-enhanced MRI scans, which were fixedly divided into 80 samples for training and 20 samples for validation. For LA datasets, four widely used metrics are employed to evaluate the segmentation performance of the model: 1) The Dice similarity coefficient (DSC, dice Similarity Coefficient); 2) Jaccard similarity coefficient (Jaccard, jaccard Similarity Coefficient); 3) 95% hausdorff distance (95HD,95%Hausdorff Distance); 4) Average symmetric distance (ASD, average Symmetric Distance). For ACDC dataset, three metrics of DSC, 95HD, and ASD were used to evaluate model performance. It is worth noting that the indicators DSC and Jaccard are more sensitive to the internal filling of the segmentation results, whereas the indicators 95HD and ASD are more sensitive to the boundaries of the segmentation results.

The MDT network architecture provided by the application does not depend on any auxiliary training module and only follows the architecture of a mean teacher. For a fair comparison, the same experimental setup was used for all comparative experiments and ablation experiments. Specifically, an initial learning rate of 10 is used ^-2 And a weight attenuation coefficient of 10 ^-4 The SGD optimizer of (1) trains the network and updates the learning rate with a poly learning rate policy during the training process. In addition, common rotation and flipping operations are used to enhance the training data. On the ACDC dataset, U-Net for 2D segmentation is selected as MDA backbone of the T-network architecture. In particular, before mixing the marked data and the unmarked data, dithering operations of brightness, contrast, saturation and hue are performed for them to generate more disturbances,and η is set to 0.1 and 2.5, respectively. In training, the training batch size was set to 24, each training batch comprised 12 marked data and 12 unmarked data, and the total number of training iterations was 30K. On the LA dataset, selecting V-Net for 3D segmentation as backbone of MDT network architecture,/I>And η is set to 0.01 and 0.5, respectively. In training, the training batch size is set to 4, each training batch includes 2 marked data and 2 unmarked data, and the total training iteration number is 15K. In addition, it was verified that all experiments of the present application were performed in the same environment with a fixed random seed, all experimental frameworks were implemented using PyTorch, and were performed on a server with a single NVIDIA GeForce RTX 3090 GPU.

Considering that there is a serious class imbalance problem in the medical image (i.e., segmenting the background to a substantial extent), the confidence threshold is set individually for the set of pixels for which the pseudo-label belongs to the background class, i.e., the maximum value of the unreliable pixel duty is set individually for the background classSpecifically, for ACDC data sets with 5% and 10% mark ratio, the +.>Are set to 0.005 and 0.01, respectively. For LA datasets with 5% and 10% mark ratio, +.>Are set to 0.05 and 0.1, respectively.

To verify the superior performance of the MDT method proposed by the present application, table 1 shows comparative experiments of the MDT method with the most recent excellent semi-supervised medical image segmentation method at different labeling ratios (5% and 10%) on ACDC dataset. These advanced methods include UA-MT, ICT, SASSNET, CPS, DTC, MC-Net, CTCT, SS-Net, MC-net+ and BCP, and their performance results are mostly from the demonstration data of CTCT and BCP for fair comparison. As shown in table 1, MDT is very competitive on all three evaluation indexes DSC, 95HD and ASD without any auxiliary training module. In particular, at a data mark ratio of 10%, MDT is superior to the latest excellent method BCP, improving performance by 0.61%, 2.56 and 0.66 on DSC, 95HD and ASD, respectively. The MDT performance on three evaluation metrics significantly improved by 10.04%, 7.93 and 2.19 compared to baseline method U-Net, only with the use of the marker data. It is worth emphasizing that even though U-Net uses all of the labeling data (i.e., full supervision), MDT has a DSC score close to full supervision (89.45% vs. 91.44%) and improves performance by 2.88 and 0.48 on 95HD and ASD, respectively, which also reflects laterally that MDT is superior in terms of contour and edges of the segmentation target. In addition, at a data mark ratio of 5%, MDT reached a 95HD score comparable to BCP and slightly improved the performance of DSC and ASD. Compared with the MC-Net+ method, MDT achieves great performance improvement, and the performance of DSC, 95HD and ASD are respectively improved by 20.47%, 5.85 and 1.53. In addition, fig. 3 shows a segmentation visualization example of ACDC dataset with a 10% mark ratio for MDT and other methods. It can be seen from fig. 3 that MDT not only achieves anatomically reliable segmentation accuracy, but also achieves better performance in the boundary details of the segmentation target. All comparisons show that the MDT applies a disturbance strategy mixing up-Decoupling on the data level and the feature level to fully utilize a great deal of priori knowledge learned from the marker data to learn robust features of the segmented target, and fully digs reliable information of the pseudo tag by using a classification entropy filtering method, so that the MDT is not only beneficial to focusing on more robust semantic features of the segmented object in a complex environment, but also optimizes understanding of the target contour and semantics by the model, thereby utilizing unlabeled data information to a greater extent.

In order to further verify the effectiveness of the MDT method proposed by the present application, an extension experiment was performed on the 3D medical dataset LA. In particular, MDT is compared in detail with some advanced excellent semi-supervised medical image segmentation methods UA-MT, SASSNET, DTC, URPC, MC-Net, SS-Net, MC-Net+ and BCP at different labeling ratios (5% and 10%), as shown in Table 2. In the case of training with only 5% or 10% of the marked data, compared to the baseline, MDT produces a significant performance gain over all evaluation metrics, which benefits from the MDT being able to focus on the valid information of the unmarked data. In addition, MDT achieved competitive DSC performance (89.74% vs. 91.47%) using only 10% of the labeling data compared to the upper limit method (V-Net using 100% labeling data). At a data labelling ratio of 10%, MDT slightly improved performance on DSC and Jaccard evaluation metrics compared to method BCP (note that BCP uses a larger training batch size of 8, unlike other methods). When very little unlabeled data is used, i.e., a data-to-label ratio of 5%, the MDT is improved to some extent over all four evaluation metrics (i.e., 0.85%, 1.34%, 0.09, and 0.29, respectively, over the second best BCP). This illustrates that MDT is capable of learning robust knowledge from large amounts of unlabeled data by virtue of its novel Mixup-Decoupling strategy when the amount of labeled data is limited.

TABLE 1

TABLE 2

As shown in table 3, in order to show the effect of each component in the MDT method proposed by the present application, ablation experiments were performed on ACDC datasets with a mark ratio of 5% and 10%. Table 3 clearly shows that at data mark ratios of 5% and 10%, whether consistency-based regularization is usedA kind of electronic deviceWhether to pursue pixel-level smoothness or to use reliable pixel loss based on pseudo-labels +.>The network is encouraged to carry out fine segmentation, so that the segmentation performance can be obviously improved. As can be seen from Table 3, training with only 5% of the data marks, use +.> and />Huge DSC gains of 11.91% and 37.44% were obtained (59.74% vs.47.83% and 85.27% vs. 47.83%), respectively. Similarly, training with 10% data labeling, use +.> and />Significant DSC gains were also achieved (86.19% vs.79.41% and 88.91% vs. 79.41%), respectively. Furthermore, they achieved considerable performance improvement in 95HD and ASD evaluation metrics. In MDT, each component achieves excellent results with different starting points, and +.>Learning of the segmentation target edges and semantics can be supplemented, so that combining them further optimizes performance.

TABLE 3 Table 3

To verify the effectiveness of the pseudo tag filtration strategy CEF proposed by the present application, ablation experiments were performed on different filtration strategies on ACDC datasets with a labeling ratio of 5% and 10%, as shown in table 4. It can be seen that the global entropy filtering GEF's filtering strategy performed well on a per class basis is not ideal, and it is even less than it is to do without filtering for pseudo tags (72.54% vs.83.75% and 83.31% vs. 87.68%). Quantitative experiments show that the class entropy filtering CEF strategy using the pseudo tags as the basis of classification not only greatly reserves correct pseudo tags of a few classes and ensures that a model has robustness to inaccurate pseudo tags, but also refines contour edges of a segmented object, so that relatively obvious performance improvement (85.27% vs.83.75% and 88.91% vs. 87.68%) is achieved. In addition, the visual comparison in fig. 2 complements the verification experiment from a qualitative point of view.

TABLE 4 Table 4

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered by the scope of the claims of the present application.

Claims

1. The semi-supervised medical image segmentation method based on the hybrid-decoupling training is characterized by comprising the following steps of:

2. The method for segmenting the semi-supervised medical image based on the hybrid-decoupled training according to claim 1, wherein the step S1 of performing the pixel-level blending of the marked image and the unmarked image from the data plane to obtain the blended target image comprises:

x ^mix ＝λx ^l +(1-λ)x ^u

3. The hybrid-decoupling training based semi-supervised medical image segmentation method according to claim 1, wherein performing a decoupling operation from the feature level between the hybrid target image and the output prediction of the labeled image to obtain a strong version prediction of the unlabeled image in step S1 comprises:

during the training process, the labeled image-mixed target image pair (x ^l ,x ^mix ) Delivering to student networkObtaining the corresponding segmentation feature map +.>And unmarked image x ^u Network for teacher to send inObtain segmentation feature map->Since convolution operations have a shift, etc., variability, then F ^mix Can be regarded as F ^l And F is equal to ^u Approximation of direct mixing:

F ^mix ≈F ^l +F ^u

F ^udec ＝F ^mix -F ^l 。

4. a hybrid-decoupling training based semi-supervised medical image segmentation method as recited in claim 3, wherein direct subtraction in the hard decoupling approach counteracts prediction of target overlap regions in the hybrid target images, which attenuates F by the following equation ^mix F in F) ^l To preserve object detail in the overlapping region:

F ^udec ＝F ^mix -λF ^l

5. The hybrid-decoupling training based semi-supervised medical image segmentation method according to claim 1, wherein forcing strong version predictions of unlabeled images to be consistent with direct predictions of unlabeled images in step S1 comprises:

6. The method for segmenting a semi-supervised medical image based on hybrid-decoupled training according to claim 1, wherein said step S2 specifically comprises:

7. The hybrid-decoupling training based semi-supervised medical image segmentation method as set forth in claim 6, wherein the proportion of screening reliable pixels increases with training, resulting inThe convergence of the model is disadvantageous, thenAn adaptive weight w is set _rp Adaptive weights w _rp Is defined as the inverse of the reliable pixel percentage multiplied by the basis weight η: