CN113516130B

CN113516130B - Semi-supervised image semantic segmentation method based on entropy minimization

Info

Publication number: CN113516130B
Application number: CN202110811842.5A
Authority: CN
Inventors: 李佐勇; 吴嘉炜; 樊好义; 张晓青; 赖桃桃
Original assignee: Minjiang University
Current assignee: Minjiang University
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2024-01-05
Anticipated expiration: 2041-07-19
Also published as: CN113516130A

Abstract

The invention relates to a semi-supervised image semantic segmentation method based on entropy minimization. Firstly, a feature gradient map regularization strategy (FGMR) is proposed, which uses gradient mapping of low-level feature maps in an encoder to enhance the encoding capability of the encoder on deep feature maps; then, an adaptive sharpening strategy is provided, and decision boundaries of unlabeled data are kept in a low-density area; and in order to further reduce the influence of noise, a low confidence consistency policy is proposed to ensure consistency of classification and segmentation. Numerous experiments confirm the superiority of the algorithm of the invention over existing methods.

Description

Semi-supervised image semantic segmentation method based on entropy minimization

Technical Field

The invention belongs to the technical field of computer vision, is used for semi-supervised image semantic segmentation, plays a crucial role in image segmentation under the scene of only a small amount of annotation data and a large amount of non-annotation data, and particularly relates to a semi-supervised image semantic segmentation method based on entropy minimization.

Background

In recent years, with the development of deep supervised learning, various computer vision tasks have been significantly progressed. However, training a deep neural network requires a large amount of marker data, which is often time consuming and expensive to acquire. Particularly in semantic segmentation tasks, a large number of pixel-level labels are required, with the labeling costs being 15 and 60 times that of region-level and image-level labels, respectively. The increase in cost of medical image segmentation is more pronounced due to the need for labeling by a specialized doctor. Therefore, attention is paid to a weak supervision segmentation method and a semi-supervision segmentation method.

Semi-supervised image semantic segmentation assumes that there is a large amount of unlabeled data and limited labeled data in the same distribution. Currently, the mainstream semi-supervised segmentation methods can be classified into a method based on generation of a countermeasure network (GAN) and a method based on consistency training. The GAN-based approach extends the generic GAN framework to pixel-level predictions in an attempt to spoof the discriminator with false untagged data. The consistency training method expects the output of the network to be smooth under different disturbances. These approaches prove their effectiveness in semi-supervised image semantic segmentation, but they also have some limitations. For example, GAN-based methods utilize unlabeled data, but require careful design of specific network structures and are difficult to train. The consistency training method needs to forward propagate each perturbed data respectively, so that additional calculation is caused, and perturbation implicitly enhances the data, which is unfair to the performance comparison between the two for the fully supervised network model without data enhancement.

Disclosure of Invention

The invention aims to overcome the defects and provide a semi-supervised image semantic segmentation method based on entropy minimization, which firstly provides a feature gradient mapping regularization strategy (FGMR) for enhancing the coding capability of an encoder on a deep feature map by using gradient mapping of a low-level feature map in the encoder; then, an adaptive sharpening strategy is provided, and decision boundaries of unlabeled data are kept in a low-density area; in order to further reduce the influence of noise, a low-confidence consistency strategy is provided for ensuring consistency of classification and segmentation; the method can obviously improve the semantic segmentation performance of the semi-supervised image without changing the network structure and extra calculation cost basically.

In order to achieve the above purpose, the technical scheme of the invention is as follows: firstly, a feature gradient mapping regularization strategy FGMR is provided, wherein the gradient mapping of a low-level feature map in an encoder is used for enhancing the encoding capability of the encoder on a deep feature map; then, an adaptive sharpening strategy is provided, and decision boundaries of unlabeled data are kept in a low-density area; and in order to further reduce the influence of noise, a low confidence consistency policy is proposed to ensure consistency of classification and segmentation.

In one embodiment of the present invention, the method changes the structure of the semi-supervised image semantic segmentation network as follows: assuming that the input image size is H×W and the category number is C, the network output is changed to the average value mu of the segmentation results _s ∈R ^H×W×C Sum of variancesThe average mu of the classification results is also output at the last layer of the encoder _c ∈R ^C Sum of variances->

For improvement of the loss function of the network, first, the loss function of the network contains a supervised loss term and an unsupervised loss term:

L＝L _s +λL _u

wherein L is _s Is a supervision loss term, L _u An unsupervised loss term, λ is a hyper-parameter that adjusts the balance between the supervised loss term and the unsupervised loss term;

for tagged data x _l ∈R ^H×W×3 The corresponding split label is y _s ∈R ^H×W×C Class label y _c ∈R ^C The method comprises the steps of carrying out a first treatment on the surface of the Will x _l The feed network obtains the corresponding mean and variance and uses the heavy parameter skill sampling to obtain the segmentation prediction z _s And classification prediction z _c Thereafter, cross entropy loss is adopted to respectively monitor y _s And the segmentation result z _s And y _c And classification result z _c The method comprises the steps of carrying out a first treatment on the surface of the For tagged data, the loss term is defined as:

where H (·, ·) is the cross entropy loss function and α (·) is the activation function of the last layer;

for unlabeled data, firstly enhancing edge gradient values of a feature map obtained by an encoder by utilizing a feature gradient map regularization strategy FGMR; then, searching for noise samples using the variance as occasional uncertainty for guiding an adaptive sharpening strategy to obtain pseudo tags of unlabeled data, wherein the pseudo tags that may be noisy are used to supervise the unlabeled data; even though occasional uncertainty performance filters out some noise samples, the noise samples generated by the pseudo tag are likely to still affect the performance of the network; to address this problem, the low confidence class in the classification result is further used to suppress segmentation predictions for the corresponding class to maintain consistency of the class; the self-adaptive sharpening loss and the class consistency loss can be mutually opposed, so that the decision boundary is in a low-density area, and a robust prediction result is obtained; the unsupervised loss function is defined as:

wherein,and->The penalty terms of feature gradient map regularization strategy FGMR, adaptive sharpening strategy adaptive sharp, and class consistency strategy class consistency, respectively.

In an embodiment of the present invention, the specific implementation formula of the feature gradient map regularization strategy FGMR is as follows:

wherein,is a gradient operator, S ^e Is an encoder of a split network, < >>Is set to not back-propagate during the training phase.

In an embodiment of the present invention, the adaptive sharpening strategy is specifically implemented as follows:

first, the common sharpening policy is defined as:

wherein T is a hyper-parameter; when T.fwdarw.0, the sharpening (p, T) results will be close to the Dirac distribution; since the result of sharpening is the target of unlabeled data, lowering T may encourage the model to produce low entropy predictions; however, the setting of T needs to be carefully designed, and especially in the image segmentation task, it is not reasonable to set the same T value for all samples;

therefore, an adaptive sharpening strategy is proposed, which filters noise samples with variance as occasional uncertainty, and adaptively adjusts the T value of each sample according to the confidence of the prediction, so that the lower the confidence, the higher the sharpening degree of the sample, namely:

wherein,equations (1) and (2) can adaptively generate pseudo-labels for each sample, and then use the mean square error loss to optimize unlabeled data, namely:

in an embodiment of the present invention, the class consistency policy class consistency has the following specific implementation formula:

wherein p is ^c ＝softmax(μ _c )，p ^s ＝softmax(μ _s ) Beta is a threshold that determines a low confidence consistency boundary.

In one embodiment of the invention, the

Compared with the prior art, the invention has the following beneficial effects: the invention provides a novel semi-supervised image semantic segmentation method based on entropy minimization. Entropy minimization has proven to be an effective semi-supervised method of implementing clustering assumptions, where decision boundaries should be located in low density regions. Specifically, the invention provides a feature gradient mapping regularization strategy to expand low-entropy segmentation prediction of the distance between feature space classes. In addition, an adaptive sharpening strategy with arbitrary uncertainty and a class consistency constraint regularization strategy are introduced to reduce the interference of noise on the pseudo tag. A large number of experiments on PASCALVOC, PASCAL-Context and blood leukocyte data sets show that the semi-supervised image semantic segmentation performance can be remarkably improved basically without network structure change and additional calculation cost.

Drawings

FIG. 1 is a diagram of a network model architecture according to the present invention.

FIG. 2 is a statistical and observation of the gradient of the U-net coding layer on the leukocyte test set.

FIG. 3 is a partial segmentation result on a PASCAL VOC data set of 1/8 th a marked sample.

FIG. 4 is a partial segmentation result on a blood leukocyte dataset of a 1/10-labeled sample.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

The invention relates to a semi-supervised image semantic segmentation method based on entropy minimization, which firstly provides a feature gradient mapping regularization strategy FGMR, wherein the encoding capacity of an encoder on a deep feature map is enhanced by using gradient mapping of a low-level feature map in the encoder; then, an adaptive sharpening strategy is provided, and decision boundaries of unlabeled data are kept in a low-density area; and in order to further reduce the influence of noise, a low confidence consistency policy is proposed to ensure consistency of classification and segmentation.

The following is a specific embodiment of the present invention.

1. Summary of the method

The invention only needs to slightly change the existing split network, and does not need careful network structure design. The network architecture of the present invention is shown in fig. 1. Assuming that the input image size is h×w, the category number is C, the specific modifications are: changing the network output to the mean mu of the segmentation results _s ∈R ^H×W×C Sum of variancesThe average mu of the classification results is also output at the last layer of the encoder _c ∈R ^C Sum of variances->In addition to the minor changes to the network described above, the improvement of the algorithm of the present invention is a loss function in the network that includes both supervised loss terms and unsupervised lossLosing items:

L＝L _s +λL _u (1)

wherein L is _s Is a supervision loss term, L _u Non-supervised loss terms, λ is the hyper-parameter that adjusts the balance between the supervised and non-supervised loss terms.

For tagged data x _l ∈R ^H×W×3 The corresponding split label is y _s ∈R ^H×W×C Class label y _c ∈R ^C . Will x _l The feed network obtains the corresponding mean and variance and uses the heavy parameter skill sampling to obtain the segmentation prediction z _s And classification prediction z _c The most common cross entropy loss is then used to separately monitor y _s And the segmentation result z _s And y _c And classification result z _c . For tagged data, the loss term is defined as:

L _s ＝∑ _H，W，C H(y _s ^H，W，C ，α _s (z _s ^H，W，C ))+∑ _C H(y _c ^C ，α _c (z _c ^C )) (2)

where H (·, ·) is the cross entropy loss function and α (·) is the activation function of the last layer.

For unlabeled data, feature Gradient Map Regularization (FGMR) is utilized to enhance edge gradient values of feature maps obtained by the encoder; the variance is then used as an occasional uncertainty to search for noise samples for guiding the adaptive sharpening to obtain pseudo tags of unlabeled data, wherein the pseudo tags that may be noisy are used to supervise the unlabeled data. Even though occasional uncertainty performance filters out some noise samples, the noise samples generated by the pseudo tag are likely to still affect the performance of the network. To address this problem, low confidence categories in the classification results are further used to suppress segmentation predictions for the respective categories to maintain consistency of the categories. The self-adaptive sharpening loss and the class consistency loss can be mutually opposed, so that the decision boundary is in a low-density area, and a robust prediction result is obtained. The unsupervised loss function is defined as:

wherein,and->Loss terms of Feature Gradient Map Regularization (FGMR), adaptive sharpening (adaptive sharp), and class consistency (class consistency), respectively.

2. Feature gradient map regularization

As shown in fig. 2, gradient statistics for different encoder layers describe the gradual enhancement of the ability of the encoder to extract edge information from lower layers to higher layers. After consistency training, the average gradient of the different encoder layers is significantly enhanced. These results indicate that a good segmentation network expects to find more edge information to improve segmentation accuracy. Inspired by these observations, a key goal of semantic segmentation is how to improve the encoder's ability to discern the edges of the target. As shown in fig. 2 (b) and 2 (c), the gradient information of the edges in the depth encoder is significantly enhanced after the consistency training [1], which confirms the reason why the consistency training method is effective is to make the encoder more edge-resolved. Therefore, the gradient characteristic map regularization design is as follows, integrating the progressive characteristics of gradient information of different coding layers and the aim of improving the edge resolution capability:

wherein,is a gradient operator, S ^e Is an encoder that partitions the network. />Is set not to be performed in the training stageAnd (5) carrying out inverse simultaneous transmission and description.

3. Adaptive sharpening

The sharpening strategy proposed by Mixmatch algorithm [2] is used to reduce the entropy of the label distribution, which employs a general strategy to adjust the classification distribution "temperature". The sharpening policy is defined as follows:

wherein T is a hyper-parameter. When T→0, the sharpening (p, T) result will be close to the Dirac distribution. Since the result of sharpening is the target of unlabeled data, lowering T may encourage the model to produce low entropy predictions. However, the setting of T needs to be carefully designed, and in particular, in the image segmentation task, it is not reasonable to set the same T value for all samples.

Therefore, the adaptive sharpening proposed by the invention uses the variance predicted by the algorithm of the invention as accidental uncertainty to filter noise samples, and adaptively adjusts the T value of each sample according to the confidence coefficient of the prediction, so that the lower the confidence coefficient is, the higher the sharpening degree of the samples is, namely:

wherein,equations 7 and 5 can adaptively generate a pseudo-signature for each sample and then use the mean square error loss to optimize the unlabeled data, namely:

the self-adaptive sharpening provided by the invention enables the network model to pay more attention to non-noise samples and samples difficult to classify, and pay less attention to noise samples and samples easy to classify.

4. Class consistency

Since the strong sharpening of difficult samples may introduce additional noise to the network, additional noise smoothing strategies are required. Due to the unbalance of category distribution and the limited number of samples, the accuracy of the prediction result of the neural network with high confidence cannot be ensured, and the segmentation result is easy to mislead; while neural networks tend to easily predict correct for low confidence predictions. Thus, classification and segmentation are expected to be consistent over low confidence predictions, rather than high confidence predictions. The loss function can be expressed as:

wherein p is ^c ＝softmax(μ _c )，p ^s ＝softmax(μ _s ) Beta is a threshold value for determining a low confidence consistency boundary, and is set in the invention

5. Experimental data and evaluation

The paspal VOC, paspal-Context and blood leukocyte dataset were used to evaluate the performance of the algorithm of the invention. The paspal VOC dataset consists of 21 classes (including background). The invention performs data enhancement on the training data set. The enhanced dataset contained 10582 training images and 1449 Zhang Yanzheng images. The PASCAL-Context dataset is a complete scene parsing dataset containing 4998 training images and 5105 test images with dense semantic tags. According to the work of the predecessor [6-3], the present invention uses semantic tags for the 60 most common classes, including background classes. The blood leukocyte image dataset contained 3 categories, which were collected from a regular hospital, containing 500 training images of 256×256 size and 500 test images of the same size.

The invention adopts the average cross ratio (mIoU) as an index for measuring the PASCAL VOC and the PASCAL-Context, and adopts the f1-score, the recall ratio, the accuracy and the precision as evaluation indexes of the blood leukocyte data set.

6. Ablation study

TABLE 1 contribution of each loss term in the PASCAL VOC data set with 1/8 marker data

The invention consists of three penalty terms. Thus, the validity of each loss term and its combination was explored. The ablation experiment is shown in table 1, wherein CE, sharpen, AS, CC, FGRM refers to cross entropy, sharpening, adaptive sharpening, class consistency and feature map gradient regularization respectively, the performance is effectively improved by three loss terms, and the performance is further improved after the feature gradient map regularization is added.

7. Qualitative and quantitative comparison

Table 2 gives the results of the evaluation on the PASCAL VOC and PASCAL-Context data sets. The performance of the algorithm of the invention compared to the baseline method (deep labv 2) was improved by 2.4% to 7.7% when different data segmentations were performed with unlabeled samples. The method of Hung et al and the s4GAN method are representative of the semi-supervised image semantic segmentation method in the last two years. Under the same experimental setup, the algorithm of the present invention achieves the best results on the PASCAL VOC datasets for the 1/3, 1/8, and 1/20 label samples and the PASCAL-Context datasets for the 1/3 and 1/8 label samples. Figure 3 shows the qualitative results of the paspal VOC dataset using 1/8 label samples.

TABLE 2 comparison of different method segmentation results on PASCALVOC and PASCAL-Context datasets

In order to further prove that the algorithm has good universality. The white blood cell dataset of the 1/10-labeled sample was tested without data enhancement. The data in Table 3 shows that the algorithm of the invention improves the F1 score by 2.23%, the recall rate by 1.67%, the precision rate by 2.46% and the accuracy rate by 0.95% on the basis of the baseline (Unet) method. Compared with the most advanced semi-supervised medical semantic segmentation method at present, the algorithm achieves the optimal segmentation effect with the minimum cost. FIG. 4 shows the partial segmentation of the white blood cell dataset of a 1/10-labeled sample, for white blood cell images with cytoplasm close to background, the algorithm of the invention can effectively segment out cytoplasm.

TABLE 3 semi-supervised segmentation performance comparison of leukocyte dataset Using 1/10 marker samples

8. Spatial complexity comparison

Table 4 spatial complexity comparisons on paspal VOC datasets

As shown by the spatial complexity comparison on the PASCAL VOC dataset in Table 4, the algorithm of the present invention only adds 1.16M of additional parameters compared to the parameters of the baseline (deep Labv 2), while the method of Huang et al and the method of Mittal et al add 2.78M of additional parameters. Compared with the comparison algorithm, the algorithm of the invention has less than half of the other methods in additional parameters.

Reference is made to:

[1]Chen S,Bortsova G,Juárez A G U,et al.Multi-task attention-based semi-supervised learning for medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer,Cham,2019:457-465.

[2]Berthelot D,Carlini N,Goodfellow I,et al.Mixmatch:A holistic approach to semi-supervised learning[J].arXiv preprint arXiv:1905.02249,2019.

[3]Sudhanshu Mittal,Maxim Tatarchenko,and Thomas Brox,“Semi-supervised semantic segmentation with high-and low-level consistency,”IEEE Transactions on Pattern Analysis and Machine Intelligence,2019.

[4]Wei Chih Hung,Yi Hsuan Tsai,Yan Ting Liou,Yen Yu Lin,and Ming Hsuan Yang,“Adversarial learning for semi-supervised semantic segmentation,”in 29th British Machine Vision Conference,BMVC 2018,2019.

[5]Liang-Chieh Chen,George Papandreou,Iasonas Kokkinos,Kevin Murphy,and Alan L Yuille,“Deeplab:Semantic im-age segmentation with deep convolutional nets,atrous convo-lution,and fully connected crfs,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.40,no.4,pp.834–848,2017.

[6]Olaf Ronneberger,Philipp Fischer,and Thomas Brox,“U-net:Convolutional networks for biomedical image segmentation,”in International Conference on Medical image computing and computer-assisted intervention.Springer,2015,pp.234–241.

[7]Shuai Chen,Gerda Bortsova,Antonio Garc′ia-Uceda Jua′rez,Gijs van Tulder,and Marleen de Bruijne,“Multi-task attention-based semi-supervised learning for medical image segmenta-tion,”in International Conference on Medical Image Comput-ing and Computer-Assisted Intervention.Springer,2019,pp.457–465.

[8]Lequan Yu,Shujun Wang,Xiaomeng Li,Chi-Wing Fu,and Pheng-Ann Heng,“Uncertainty-aware self-ensembling modelfor semi-supervised 3d left atrium segmentation,”in In-ternational Conference on Medical Image Computing and Computer-Assisted Intervention.Springer,2019,pp.605–613.。

the above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The semi-supervised image semantic segmentation method based on entropy minimization is characterized by firstly providing a feature gradient mapping regularization strategy FGMR, wherein the feature gradient mapping regularization strategy FGMR uses gradient mapping of a low-level feature map in an encoder to enhance the encoding capability of the encoder on a deep feature map; then, an adaptive sharpening strategy is provided, and decision boundaries of unlabeled data are kept in a low-density area; in order to further reduce the influence of noise, a low-confidence consistency strategy is provided for ensuring consistency of classification and segmentation; the method changes the semi-supervised image semantic segmentation network structure as follows: assuming that the input image size is H×W and the category number is C, the network output is changed to the average value mu of the segmentation results _s ∈R ^H×W×C Sum of variancesThe average mu of the classification results is also output at the last layer of the encoder _c ∈R ^C Sum of variances->

L＝L _s +λL _u

for tagged data x _l ∈R ^H×W×3 The corresponding split label is y _s ∈R ^H×W×C Class label y _c ∈R ^C The method comprises the steps of carrying out a first treatment on the surface of the Will x _l Feeding the network to obtain corresponding average valueAnd variance, and obtaining a segmentation prediction z using heavy parameter skill sampling _s And classification prediction z _c Thereafter, cross entropy loss is adopted to respectively monitor y _s And the segmentation result z _s And y _c And classification result z _c The method comprises the steps of carrying out a first treatment on the surface of the For tagged data, the loss term is defined as:

for unlabeled data, firstly enhancing edge gradient values of a feature map obtained by an encoder by utilizing a feature gradient map regularization strategy FGMR; then, searching for noise samples using the variance as occasional uncertainty for guiding an adaptive sharpening strategy to obtain pseudo tags of unlabeled data, wherein the pseudo tags with noise are used to supervise the unlabeled data; even though occasional uncertainty performance filters out some noise samples, noise samples generated by the pseudo tag can affect the performance of the network; to address this problem, the low confidence class in the classification result is further used to suppress segmentation predictions for the corresponding class to maintain consistency of the class; the self-adaptive sharpening loss and the class consistency loss can be mutually opposed, so that the decision boundary is in a low-density area, and a robust prediction result is obtained; the unsupervised loss function is defined as:

wherein,and->Feature gradient map regularization strategy FGMR, adaptive sharpening strategy adaptive sharp and class consistency, respectivelyThe loss term of policy class consistency.

2. The semi-supervised image semantic segmentation method based on entropy minimization of claim 1, wherein the feature gradient map regularization strategy FGMR has the following specific implementation formula:

3. The semi-supervised image semantic segmentation method based on entropy minimization of claim 1, wherein the adaptive sharpening strategy adaptive sharp is specifically implemented as follows:

first, the common sharpening policy is defined as:

wherein T is a hyper-parameter; when T.fwdarw.0, the sharpening (p, T) results will be close to the Dirac distribution; since the result of sharpening is targeted for unlabeled data, lowering T encourages the model to produce low entropy predictions; in the image segmentation task, the T is designed in advance, but the same T is not set for all samples;

wherein,equations (1) and (2) adaptively generate pseudo-labels for each sample, and then use the mean square error loss to optimize unlabeled data, namely:

4. the semi-supervised image semantic segmentation method based on entropy minimization as set forth in claim 1, wherein the class consistency policy class consistency is specifically formulated as follows:

wherein p is ^c ＝softmax(μ _c ),p ^s ＝softmax(μ _s ) Beta is a threshold that determines a low confidence consistency boundary.

5. The semi-supervised image semantic segmentation method based on entropy minimization as set forth in claim 4, wherein the following steps