CN115393289A

CN115393289A - Tumor image semi-supervised segmentation method based on integrated cross pseudo label

Info

Publication number: CN115393289A
Application number: CN202210940799.7A
Authority: CN
Inventors: 章琛曦; 裘茗烟; 宋志坚; 许剑民; 郑鹏; 冯青阳
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2022-08-06
Filing date: 2022-08-06
Publication date: 2022-11-25

Abstract

The invention belongs to the technical field of medical image processing, and particularly relates to a tumor image semi-supervised segmentation method based on integrated cross pseudo labels. The method of the invention trains three segmentation models with larger differences: generating a pseudo label without labeling data based on a UNETR model of a Transformer architecture, on the basis of attention U-Net of a CNN (common network node), and on the basis of an attention segmentation network model CSA-U-Net of multi-scale feature information of the CNN; and then expanding training data by using label-free data with pseudo labels, and alternately supervising the consistency of the output result by using the pseudo label data. And finally, integrating output results of the plurality of models to improve the accuracy of the segmentation model. The method can better and directly improve the model segmentation performance by combining the information of the unmarked data and the marked data.

Description

Tumor image semi-supervised segmentation method based on integrated cross pseudo label

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a tumor image semi-supervised segmentation method based on integrated cross pseudo labels.

Background

In several previous methods of consistency regularization, model-based perturbation methods are generally trained in the manner of using the same model architecture with different initialization parameters. The larger the difference of the models is, the structural difference between different models can cause final prediction disturbance, but the information focused by different models can be supplemented with each other in the training process. Therefore, the advantages of various semi-supervised methods are combined, multiple models with the largest difference are adopted for simultaneous training, the obtained pseudo label difference is large, the pseudo label data can better expand the training data, and meanwhile, the consistency of the output of the multiple models is restrained to ensure a more compact characteristic space, so that the performance of the segmentation model is improved. In the later optimization stage, the noise of the pseudo label is reduced, and the model becomes more stable and accurate, which is superior to the training based on the conventional full supervision.

Disclosure of Invention

The invention aims to provide a tumor image semi-supervised segmentation method based on an integrated cross pseudo label, which is high in segmentation precision.

The invention provides a tumor image semi-supervised segmentation method based on an integrated cross pseudo label, which comprises the following specific steps:

(1) Preprocessing an MRI image, including intensity normalization and histogram equalization of the image, dividing the whole image into a certain number of small pixels, and performing nonlinear stretching to uniformly distribute local gray level histograms;

(2) Taking the preprocessed image as an input image, zooming and inputting the input image into a segmentation model U-Net for rough segmentation, and positioning a target region; then cutting out an image taking the target area as the center from the input image, and increasing the ratio of foreground pixels;

(3) Then training a segmentation network model formed by three models with larger differences, wherein the segmentation network model specifically comprises the following steps: the method comprises the following steps of firstly, obtaining a UNETR model based on a Transformer framework, secondly, obtaining the attention U-Net based on CNN, and thirdly, obtaining an attention segmentation network model CSA-U-Net based on multi-scale feature information of CNN; generating a pseudo label without the labeling data; then expanding training data by using label-free data with a pseudo label, and alternately supervising and constraining the consistency of output results through the pseudo label data;

the three models with the largest difference are adopted for simultaneous training, so that the obtained pseudo label difference is large, the pseudo label data can better expand the training data, and meanwhile, the consistency of the output of the multiple models is restrained to ensure a more compact characteristic space;

(4) And inputting the cut target area into a trained segmentation network model, and integrating output results of a plurality of models to obtain a high-precision segmentation result.

In the invention:

the attention U-Net is formed by adding an attention gating module in a jump connection on the basis of the U-Net.

The attention segmentation network model CSA-U-Net of the multi-scale feature information is characterized in that an attention gating module is added on the basis of U-Net, the convolution features of a layer with a larger scale l-1 in a decoder are used as gating signals to select a space region of a feature map in a layer of an encoder, namely, an attention coefficient is calculated in a global range to identify a significant image region, and feature response is trimmed to only keep activation related to a specific task; then splicing and fusing the screened feature maps and the corresponding l-layer decoder fine and dense prediction feature maps so as to better assist the decoder in carrying out target positioning and recovery; by utilizing multi-scale information, feature maps with different scales are derived in a decoding stage for prediction, namely, spatial semantic information with different scales is embedded in the process of calculating a loss function so as to better supervise the training of the model.

The encoding structure of the UNETR model based on the Transformer architecture consists of stacked pure Transformer modules, the decoding structure is similar to the U-Net architecture, the output of the UNETR model is up-sampled by two layers of convolution operations of 3 multiplied by 3, and deconvolution is used until the characteristic diagram returns to the resolution of the original input. And finally, outputting a final segmentation prediction result through the 1 × 1 × 1 convolutional layer with Softmax activation.

Furthermore, in the preprocessed image, labeled data are input into three segmentation network models with large differences, and the three segmentation models are obtained through training. Inputting label-free data into the three trained segmentation models, wherein the three models respectively output three prediction probability results P1, P2 and P3, and respectively obtain corresponding one-hot coded labels Y1, Y2 and Y3 through parameter operation of the maximum value; suppose the model that verifies the best aggregate is model f (θ) ₃ ) The sub-optimal model is the model f (θ) ₂ ) Then f (theta) will be ₃ ) The output pseudo label Y3 supervises the predicted probability results of the other two models, and f (theta) ₂ ) P3, which minimizes noise in the pseudo tag. And then performing back propagation iterative optimization on all the three models.

Further, when embedding the spatial semantic information of different scales, resampling all voxels to a uniform size by adopting a third-order spline interpolation method.

Further, in model training, the loss function adopts a combination of Dice and cross entropy loss function (namely L) _Dice +L _CE ) The total Loss value is the sum of the Loss values of marked data and unmarked data, and Loss = L _{With a label} +αL _Label-free Where α represents a weight.

Further, in model training, an optimization algorithm Adabound is adopted for iterative optimization, the weight is continuously updated, the loss value is gradually reduced, and finally the segmented network model of the target task is trained and completed.

Further, in each training iteration, the input image is subjected to a random transformation including random rotation, cropping, scaling and flipping operations to enhance the data.

Further, the output results of the multiple models are not simply integrated by the traditional voting method, but all voxels are traversed on the basis of the best model output based on the verification set, and when a certain voxel is predicted to be 1 in the other two models, and the optimal model is predicted to be 0, the output result is changed to be 1. And integrating a plurality of results to obtain a final segmentation result.

The invention has the beneficial effects that: the invention relates to a tumor semi-supervised segmentation method based on integrated cross pseudo labels, which generates pseudo labels without labeled data by training three models with larger differences, then expands the training data by using the unlabelled data with the pseudo labels, and alternately supervises and restricts the consistency of output results by using the pseudo label data. And finally, integrating output results of a plurality of models to better improve the precision of the segmentation model.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of the attention module U-Net of the present invention.

FIG. 3 is a schematic diagram of an attention segmentation network model CSA-U-Net of multi-scale feature information in the present invention.

FIG. 4 is a schematic diagram of a UNETR model based on a Transformer architecture in the present invention.

Fig. 5 is a diagram illustrating an example of segmentation results of different loss functions.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

The method comprises the steps of generating a pseudo label of unlabeled data by training three models with large differences, specifically comprising one model based on a Transformer architecture and two models based on a Convolutional Neural Network (CNN), expanding training data by using the unlabeled data with the pseudo label, and alternately supervising and constraining consistency of output results by using the pseudo label data. And finally, integrating output results of a plurality of models to improve the precision of the segmentation model, and can better and directly improve the segmentation performance of the model by combining information of unmarked data and marked data.

The invention discloses a tumor semi-supervised segmentation method based on integrated cross pseudo labels, which comprises the following steps of:

step 1, preprocessing an image: firstly, intensity normalization and histogram equalization are carried out on the image, the whole image is divided into a plurality of small blocks of pixels to carry out nonlinear stretching, and the local gray level histogram is uniformly distributed.

Training data set at training: the image data sets are from 296 cases of sagittal plane T2WI imaging data of rectal cancer patients from 2019 to 2021 of Zhongshan Hospital of the university of Fudan, wherein 135 cases of labeled data exist, and 161 cases of unlabeled data do not exist. Except for preprocessing the image, in order to enable the network to correctly learn the spatial semantics, a third-order spline interpolation method is adopted to resample all voxels into a uniform size. And adopting a nearest neighbor interpolation method for corresponding segmentation labeling. Random transformations of random rotation, cut, scale and flip operations are primarily included in each training iteration to enhance the data.

Step 2, as shown in the method flowchart of fig. 1, the preprocessed image is cut out as an input image to an approximate rectum area and then is reduced to 192 × 128 × 32, and labeled data is input into three segmentation network models with large differences, and three segmentation models are obtained through training. And then inputting the label-free data into the trained segmentation model, outputting three prediction probability results P1, P2 and P3 by the three models respectively, and obtaining corresponding one-hot coded labels Y1, Y2 and Y3 by parameter operation of the maximum value respectively. The model that verifies the best aggregate is assumed to be model f (theta) ₃ ) The sub-optimal is the model f (θ) ₂ ) Then f (theta) will be ₃ ) The output pseudo label Y3 is used for supervising the pre-processing of the other two modelsMeasure the probability result, and f (theta) ₂ ) P3, which minimizes noise in the pseudo tag. And then performing back propagation iterative optimization on all the three models.

Step 3, establishing a segmentation network model: the three models with larger differences are mainly divided into two types based on CNN and chance transform. The main backbone networks of the two models based on the CNN are both U-Net, the U-Net mainly comprises a convolution layer, a maximum pooling layer (down sampling), a deconvolution layer (up sampling) and a Relu nonlinear activation function, and a Leaky Relu with a negative slope of 0.01 is used as the activation function.

Attention U-Net is the addition of an attention gating module to the jump connection based on U-Net (see FIG. 2). The attention gating module uses the convolution characteristics of the coarser scale (l-1) layer in the decoder as a gating signal to select the spatial region of the feature map in the i layer of the encoder, i.e. calculates the attention coefficient in the global domain to identify the salient image region, and prunes the feature response to only retain the activation related to a specific task. And then splicing and fusing the screened feature maps and the corresponding l-layer decoder fine and dense prediction feature maps so as to better assist the decoder in carrying out target positioning and recovery.

An attention segmentation network model CSA-U-Net (see figure 3) of multi-scale feature information is added with an attention gating module on the basis of U-Net; and by utilizing multi-scale information, all the multi-scales output by the attention gating module can obtain a segmentation probability graph with the same size as an input image through upsampling, then the segmentation probability graph is inserted into a space semantic signal with different scales and is calculated with a manual labeling graph through a loss function, loss values are added together to perform back propagation, and network weight is updated, namely, the space semantic information with different scales is embedded in the process of calculating the loss function so as to better supervise the training of the model.

Fig. 4 shows an UNETR model based on a transform architecture, and a coding structure is composed of stacked pure transform modules, and the transform modules are mainly composed of a Multi-head Self-attention (MSA) and a Multi-layer Perceptron (MLP). MSA is composed of n parallel self-attentions (S)elf-attribution) head components, each head is a parameterized function used for learning the input sequence z e R ^N×K The similarity between the query (q) and the key (k) can be calculated to obtain the attention weight W of the input sequence z _att . The decoding structure is similar to the U-Net architecture, with two layers of 3 × 3 × 3 convolution operations and the output upsampled using deconvolution until the feature map reverts to the original input resolution. And finally, outputting a final segmentation prediction result through the 1 × 1 × 1 convolutional layer with Softmax activation.

And 4, performing iterative optimization by adopting an optimization algorithm Adabound according to the process in the step 1, continuously updating the weight, gradually reducing the loss value, and finally training to finish the segmentation model of the three target tasks.

And 5, inputting the test image into the three segmentation network models in sequence for further segmentation, and reasoning to obtain three complete segmentation results.

And 6, integrating the three segmentation results, traversing all voxels on the basis of the best model output based on the verification set, and changing the output result to be 1 when a certain voxel is predicted to be 1 in the other two models and the optimal model is predicted to be 0.

Firstly, the influence of the quantity of the unmarked data on the model segmentation performance is compared, and meanwhile, the influence of the segmentation results of different segmentation models and the influence of a post-processing method on the models are also evaluated. As shown in table 1, when the amount of unlabeled data is 40, the noise influence of the pseudo tag is small, and the model can acquire useful information from the unlabeled data and greatly improve the model segmentation accuracy. Along with the gradual increase of the quantity of the unmarked data, the precision of each model is reduced, and when the quantity of the unmarked data reaches 120 cases, the segmentation precision is even lower than the result of full supervision, which shows that the noise influence of the unmarked data is greater than the valuable information provided by the unmarked data. The model precision is increased again until the unmarked data volume reaches 161 cases, and at this time, it is likely that the unmarked data volume gradually forms a low density area, and the noise influence gradually decreases. And no matter how many the number of the unmarked data is, the segmentation result after post-processing integration is obviously improved compared with the result of a single segmentation model.

TABLE 1

Table 2.

The performance of the method proposed by the invention (denoted as ECP) and a plurality of classical semi-supervised learning methods are also compared. As shown in table 2, a CSA-U-Net model is used as a backbone network for each method, and an entropy regularization method EM, a method CPS based on model perturbation, a method CCT based on feature perturbation, a method based on input and model perturbation, such as MT, an Uncertainty Aware Mean Teacher (UAMT), an Uncertainty corrected Pyramid Consistency (URPC), and a method DTC based on task Consistency are compared. The segmentation result visualization is shown in fig. 5. The experimental result of the embodiment shows that the method has higher and more stable tumor segmentation precision; the tumor semi-supervised segmentation method integrated with the cross pseudo labels can better and directly improve the segmentation performance of the model by combining the information of the unlabeled data and the labeled data.

The above embodiments are described in more detail and specifically, but should not be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims

1. A tumor image semi-supervised segmentation method based on integrated cross pseudo labels is characterized by comprising the following specific steps:

(3) Then training a segmentation network model formed by three models with larger differences, wherein the three models are respectively as follows: the method comprises the following steps of A, constructing a UNETR model based on a Transformer framework, an attention U-Net based on CNN, and an attention segmentation network model CSA-U-Net based on multi-scale feature information of CNN; a pseudo label used for generating label-free data; then expanding training data by using label-free data with a pseudo label, and alternately supervising and constraining the consistency of output results through the pseudo label data;

2. The tumor image semi-supervised segmentation method according to claim 1, wherein the attention U-Net is a U-Net based method in which an attention gating module is added to a jump connection;

the attention segmentation network model CSA-U-Net of the multi-scale feature information is characterized in that an attention gating module is added on the basis of U-Net, the convolution features of a layer with a thicker scale l-1 in a decoder are used as gating signals to select a space region of a feature map in a layer l of an encoder, namely, an attention coefficient is calculated in a global range to identify a significant image region, and feature response is trimmed to only reserve activation related to a specific task; then splicing and fusing the screened feature maps and the corresponding l-layer decoder fine and dense prediction feature maps so as to better assist the decoder in carrying out target positioning and recovery; by utilizing multi-scale information, feature maps with different scales are derived in a decoding stage for prediction, namely, spatial semantic information with different scales is embedded in the process of calculating a loss function so as to better supervise the training of the model;

the encoding structure of the UNETR model based on the Transformer architecture is composed of stacked pure Transformer modules, the decoding structure is similar to the U-Net architecture, the output of the UNETR model is up-sampled by two layers of convolution operations of 3 multiplied by 3, and deconvolution is used until the feature map returns to the resolution of the original input; and finally, outputting a final segmentation prediction result through the 1 × 1 × 1 convolutional layer with Softmax activation.

3. The semi-supervised segmentation method for tumor images as claimed in claim 2, wherein labeled data in the preprocessed images are input into three segmentation network models with larger differences, and the three segmentation models are obtained through training; inputting label-free data into the three trained segmentation models, wherein the three models respectively output three prediction probability results P1, P2 and P3, and respectively obtain corresponding one-hot coded labels Y1, Y2 and Y3 through parameter operation of the maximum value; the model that verifies the best aggregate is assumed to be model f (theta) ₃ ) The sub-optimal model is the model f (θ) ₂ ) Then f (theta) is set ₃ ) The output pseudo label Y3 supervises the predicted probability results of the other two models, and f (theta) ₂ ) P3 is supervised by the pseudo tag Y2, so that the noise in the pseudo tag is reduced to the maximum extent; and then performing back propagation iterative optimization on the three models.

4. The semi-supervised segmentation method for tumor images as claimed in claim 3, wherein when embedding spatial semantic information of different scales, a third-order spline interpolation method is adopted to resample all voxels to a uniform size.

5. The semi-supervised segmentation method for tumor images as claimed in claim 4, wherein in model training, the loss function adopts Dice and cross entropy lossFunction combination, i.e. L _Dice +L _CE The total Loss value is the sum of the Loss values of marked data and unmarked data, and the Loss = L _{With a label} +αL _Label-free Where α represents a weight.

6. The semi-supervised segmentation method for tumor images as recited in claim 5, wherein in model training, an optimization algorithm Adabound is adopted for iterative optimization, weights are continuously updated, loss values are gradually reduced, and finally a segmentation network model for completing a target task is trained.

7. The method of semi-supervised segmentation of tumor images as set forth in claim 2, wherein the input images are subjected to stochastic transforms including stochastic rotation, cropping, scaling, and flipping operations in each training iteration to enhance data.

8. The semi-supervised segmentation method for tumor images as claimed in claim 2, wherein the output result of the integration of the multiple models is to go through all voxels based on the best model output based on the validation set, and when a certain voxel is predicted to be 1 in the other two models and the optimal model is predicted to be 0, the output result is changed to be 1; and integrating a plurality of results to obtain a final segmentation result.