CN113516130A

CN113516130A - Entropy minimization-based semi-supervised image semantic segmentation method

Info

Publication number: CN113516130A
Application number: CN202110811842.5A
Authority: CN
Inventors: 李佐勇; 吴嘉炜; 樊好义; 张晓青; 赖桃桃
Original assignee: Minjiang University
Current assignee: Minjiang University
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-10-19
Anticipated expiration: 2041-07-19
Also published as: CN113516130B

Abstract

The invention relates to a semi-supervised image semantic segmentation method based on entropy minimization. Firstly, a feature gradient mapping regularization strategy (FGMR) is proposed, which uses gradient mapping of a low-layer feature map in an encoder to enhance the encoding capability of the encoder on a deep-layer feature map; then, a self-adaptive sharpening strategy is provided, and the decision boundary of the unmarked data is kept in a low-density area; and in order to further reduce the influence of noise, a low-confidence consistency strategy is provided to ensure the consistency of classification and segmentation. Numerous experiments confirm the superiority of the algorithm of the present invention over existing methods.

Description

Entropy minimization-based semi-supervised image semantic segmentation method

Technical Field

The invention belongs to the technical field of computer vision, is used for semi-supervised image semantic segmentation, plays a vital role in image segmentation under the scene with only a small amount of annotation data and a large amount of non-annotation data, and particularly relates to a semi-supervised image semantic segmentation method based on entropy minimization.

Background

In recent years, with the development of deep supervised learning, various computer vision tasks have been remarkably developed. However, training a deep neural network requires a large amount of label data, which is often time consuming and expensive to acquire. Especially in the semantic segmentation task, a large number of pixel-level labels are needed, and the labeling cost is 15 times and 60 times of that of the region-level labels and the image-level labels respectively. The increase in medical image segmentation cost is more significant due to the need for labeling by a professional physician. Therefore, there is an increasing interest in weakly supervised segmentation methods and semi-supervised segmentation methods.

Semi-supervised image semantic segmentation assumes a large amount of unlabeled data and limited labeled data in the same distribution. Currently, mainstream semi-supervised segmentation methods can be classified into methods based on generation of a countermeasure network (GAN) and methods based on consistency training. GAN-based approaches extend the generic GAN framework to pixel-level prediction, trying to spoof the discriminator with false unmarked data. The consistency training method expects the output of the network to be smooth under different perturbations. These methods have proven their effectiveness in semi-supervised image semantic segmentation, but they also have some limitations. For example, GAN-based methods utilize unlabeled data, but require careful design of specific network structures and are difficult to train. The consistency training method needs to forward propagate each disturbed data, causing extra calculation, and the disturbance implicitly enhances the data, which is unfair for the performance comparison between the two for the fully-supervised network model without data enhancement.

Disclosure of Invention

The invention aims to overcome the defects and provides a semi-supervised image semantic segmentation method based on entropy minimization, which firstly provides a feature gradient mapping regularization strategy (FGMR), and the FGMR uses the gradient mapping of low-layer feature maps in an encoder to enhance the encoding capacity of the encoder on deep-layer feature maps; then, a self-adaptive sharpening strategy is provided, and the decision boundary of the unmarked data is kept in a low-density area; in order to further reduce the influence of noise, a low-confidence consistency strategy is provided to ensure the consistency of classification and segmentation; the method provided by the invention can obviously improve the semantic segmentation performance of the semi-supervised image basically without network structure modification and extra calculation cost.

In order to achieve the purpose, the technical scheme of the invention is as follows: a semi-supervised image semantic segmentation method based on entropy minimization firstly provides a feature gradient mapping regularization strategy FGMR, which uses gradient mapping of a low-layer feature map in an encoder to enhance the encoding capability of the encoder on a deep-layer feature map; then, a self-adaptive sharpening strategy is provided, and the decision boundary of the unmarked data is kept in a low-density area; and in order to further reduce the influence of noise, a low-confidence consistency strategy is provided to ensure the consistency of classification and segmentation.

In an embodiment of the present invention, the method changes a semi-supervised image semantic segmentation network structure into: assuming the input image size is H × W and the number of classes is C, the network output is changed to the mean value μ of the segmentation result_s∈R^H×W×CSum variance

Also the mean value mu of the classification result is output at the last layer of the encoder_c∈R^CSum variance

For the improvement of the loss function of the network, first, the loss function of the network contains a supervised loss term and an unsupervised loss term:

L＝L_s+λL_u

wherein L is_sIs a supervisory loss term, L_uUnsupervised loss term, λ is regulatory supervisionSupervising the over-parameters of balance between the loss items and the unsupervised loss items;

for tagged data x_l∈R^H×W×3Corresponding segmentation label y_s∈R^H×W×CThe category label is y_c∈R^C(ii) a X is to be_lThe data is sent to a network to obtain corresponding mean value and variance, and segmentation prediction z is obtained by sampling according to a heavy parameter technique_sAnd classification prediction z_cThen using cross entropy loss to monitor y separately_sAnd the segmentation result z_sAnd y is_cAnd the classification result z_c(ii) a For tagged data, the loss term is defined as:

wherein H (·,) is a cross-entropy loss function, α (·) is an activation function of the last layer;

for the unlabeled data, firstly, enhancing the edge gradient value of the feature map obtained by the encoder by using a feature gradient map regularization strategy FGMR; then, searching a noise sample by using the variance as an accidental uncertainty, and guiding an adaptive sharpening strategy to obtain a pseudo label of the unlabeled data, wherein the pseudo label which may bring noise is used for supervising the unlabeled data; even though the accidental uncertainty performance filters some noise samples, the noise samples generated by the pseudo-tag still probably affect the performance of the network; to solve this problem, the low confidence class in the classification result is further used to suppress the segmentation prediction of the corresponding class to maintain the consistency of the class; the adaptive sharpening loss and the similar consistency loss can resist each other, so that the decision boundary is in a low-density area, and a steady prediction result is obtained; the unsupervised loss function is defined as:

wherein the content of the first and second substances,

and

the loss terms are respectively of a feature gradient map regularization strategy FGMR, an adaptive sharpening strategy adaptive sharp and a class consistency strategy class consistency.

In an embodiment of the present invention, the feature gradient map regularization strategy FGMR specifically implements the following formula:

wherein the content of the first and second substances,

is a gradient operator, S^eIs an encoder that partitions the network of the network,

the training phase is set to not perform back propagation.

In an embodiment of the present invention, the adaptive sharpening policy adaptive sharpen is specifically implemented as follows:

first, a common sharpening strategy is defined as:

wherein T is a hyper-parameter; when T → 0, the sharpening (p, T) result will approach the dirac distribution; since the sharpened result is targeted for unmarked data, lowering T may encourage the model to produce low entropy predictions; however, the setting of T needs to be carefully designed, and especially in the image segmentation task, it is not reasonable to set the same value of T for all samples;

therefore, an adaptive sharpening strategy adaptive sharpen is proposed, which filters noise samples with the variance as the contingent uncertainty and adaptively adjusts the T value of each sample according to the predicted confidence level, so that the lower the confidence level, the higher the sharpening degree of the sample, i.e.:

wherein the content of the first and second substances,

equations (1) and (2) can adaptively generate pseudo-labels for each sample, and then use the mean square error loss to optimize the unlabeled data, i.e.:

in an embodiment of the present invention, a specific implementation formula of the class consistency policy is as follows:

wherein p is^c＝softmax(μ_c)，p^s＝softmax(μ_s) And β is a threshold for determining a low confidence coherency boundary.

In one embodiment of the present invention, the

Compared with the prior art, the invention has the following beneficial effects: the invention provides a novel entropy minimization-based semi-supervised image semantic segmentation method. Entropy minimization has proven to be an effective semi-supervised method of implementing clustering hypotheses, where decision boundaries should be located in low density regions. Specifically, the invention provides a feature gradient mapping regularization strategy to expand low-entropy segmentation prediction of feature space inter-class distance. In addition, an adaptive sharpening strategy with any uncertainty and a consistency-like constraint regularization strategy are introduced to reduce the interference of noise on pseudo labels. A large number of experiments on PASCALVOC, PASCAL-Context and blood leukocyte data sets show that the method can obviously improve the semantic segmentation performance of the semi-supervised image without network structure change and extra calculation cost basically.

Drawings

FIG. 1 is a network model architecture of the present invention.

FIG. 2 is a graph showing statistics and observations of the gradient of the U-net coding layer on the leukocyte test set.

Fig. 3 is a partial segmentation result on the PASCAL VOC data set of the 1/8 labeled sample.

FIG. 4 shows the partial segmentation results on the 1/10 labeled sample blood leukocyte data set.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

The invention relates to a semi-supervised image semantic segmentation method based on entropy minimization, which firstly provides a feature gradient mapping regularization strategy FGMR, wherein the gradient mapping of a low-layer feature map in an encoder is used for enhancing the encoding capacity of the encoder on a deep-layer feature map; then, a self-adaptive sharpening strategy is provided, and the decision boundary of the unmarked data is kept in a low-density area; and in order to further reduce the influence of noise, a low-confidence consistency strategy is provided to ensure the consistency of classification and segmentation.

The following is a specific embodiment of the present invention.

1. Overview of the method

The invention only needs to slightly change the existing segmentation network and does not need to carry out careful network structure design. The network architecture of the present invention is shown in fig. 1. Assuming that the size of the input image is H × W and the number of categories is C, the specific modification is: changing the net output to the mean value mu of the segmentation result_s∈R^H×W×CSum variance

In addition to the above-described minor changes to the network, the algorithm of the present invention improves on the loss function of the network, which contains both supervised and unsupervised loss terms:

L＝L_s+λL_u (1)

wherein L is_sIs a supervisory loss term, L_uAn unsupervised loss term, and λ is a hyper-parameter that adjusts the balance between the supervised and unsupervised loss terms.

For tagged data x_l∈R^H×W×3Corresponding segmentation label y_s∈R^H×W×CThe category label is y_c∈R^C. X is to be_lThe data is sent to a network to obtain corresponding mean value and variance, and segmentation prediction z is obtained by sampling according to a heavy parameter technique_sAnd classification prediction z_cThen the most common cross-entropy loss is used to supervise y separately_sAnd the segmentation result z_sAnd y is_cAnd the classification result z_c. For tagged data, the loss term is defined as:

L_s＝∑_H，W，CH(y_s ^H，W，C，α_s(z_s ^H，W，C))+∑_CH(y_c ^C，α_c(z_c ^C)) (2)

where H (-) is the cross entropy loss function and α (-) is the activation function of the last layer.

For the unlabeled data, firstly, using Feature Gradient Map Regularization (FGMR) to enhance the edge gradient value of the feature map obtained by the encoder; the variance is then used as a contingent uncertainty to search for noise samples for directing adaptive sharpening to obtain pseudo-labels of the unlabeled data, where the pseudo-labels that may contribute noise are used to supervise the unlabeled data. Even though occasional indeterminate performance filters out some of the noise samples, the noise samples generated by the pseudo-tags are likely to still affect the performance of the network. To address this problem, low confidence classes in the classification results are further used to suppress segmentation prediction of the corresponding class to maintain class consistency. The adaptive sharpening loss and the consistency-like loss can be mutually confronted, so that the decision boundary is in a low-density area, and a stable prediction result is obtained. The unsupervised loss function is defined as:

wherein the content of the first and second substances,

and

are the penalty terms for Feature Gradient Map Regularization (FGMR), adaptive sharpening (adaptive sharpening), and class consistency (class consistency), respectively.

2. Feature gradient map regularization

As shown in fig. 2, the gradient statistics of different encoder layers describe a gradual increase in the ability of the encoder to extract edge information from lower layers to higher layers. After consistency training, the average gradient of different encoder layers is significantly enhanced. These results indicate that a good segmentation network expects to find more edge information to improve the segmentation accuracy. Inspired by these observations, a key goal of semantic segmentation is how to improve the encoder's ability to discern target edges. As shown in fig. 2(b) and fig. 2(c), the gradient information of the edge in the depth encoder is obviously enhanced after consistency training [1], which confirms that the consistency training method is effective because the encoder is made to have more edge resolution. Therefore, by combining the progressive characteristics of gradient information of different coding layers and the goal of improving the edge resolution capability, the regularization of the gradient feature map is designed as follows:

wherein the content of the first and second substances,

is a gradient operator, S^eIs an encoder that partitions the network.

The anti-synchronous tracing is not set in the training stage.

3. Adaptive sharpening

The sharpening strategy proposed by the Mixmatch algorithm [2] is used to reduce the entropy of the label distribution, which employs a general strategy of adjusting the "temperature" of the classification distribution. The sharpening strategy is defined as follows:

where T is a hyperparameter. When T → 0, the sharpening (p, T) result will be close to the Dirac distribution. Since the sharpened result is targeted for unmarked data, lowering T may encourage the model to produce low entropy predictions. However, the setting of T requires careful design, and especially in image segmentation tasks, it is not reasonable to set the same value of T for all samples.

Therefore, the adaptive sharpening proposed by the present invention uses the variance predicted by the algorithm of the present invention as the contingent uncertainty to filter the noise samples, and adaptively adjusts the T value of each sample according to the predicted confidence, so that the lower the confidence, the higher the sharpening degree of the sample, that is:

wherein the content of the first and second substances,

equations 7 and 5 can adaptively generate pseudo-labels for each sample and then use the mean square error loss to optimize the unlabeled data, i.e.:

the self-adaptive sharpening provided by the invention enables the network model to pay more attention to non-noise samples and samples which are difficult to classify, and pay less attention to noise samples and samples which are easy to classify.

4. Category consistency

Since strong sharpening of hard samples may cause additional noise to the network, an additional noise smoothing strategy is required. Due to the unbalancedness of class distribution and the limitation of the number of samples, the accuracy of a prediction result with high confidence of a neural network cannot be ensured, and a segmentation result is easy to mislead; and the neural network can easily predict the correct result for the prediction result with low confidence. Therefore, it is desirable that the classification and segmentation be consistent over low confidence predictions, rather than high confidence predictions. The loss function can be expressed as:

wherein p is^c＝softmax(μ_c)，p^s＝softmax(μ_s) Beta is a threshold value for determining a low confidence consistency boundary, and is set in the invention

5. Experimental data and evaluation

The PASCAL VOC, PASCAL-Context and blood leukocyte data sets are used to evaluate the performance of the algorithm of the invention. The PASCAL VOC data set consists of 21 classes (including background). The present invention performs data enhancement on a training data set. The enhanced data set contained 10582 training images and 1449 verification images. The PASCAL-Context dataset is a complete scene parsing dataset comprising 4998 training images and 5105 test images with dense semantic labels. According to the predecessor's work [6-3], the present invention uses semantic tags for the 60 most common classes, including the background class. The blood leukocyte image data set contained 3 categories, collected from a regular hospital, containing 500 training images of 256 × 256 size and 500 test images of the same size.

The invention adopts average cross-over ratio (mIoU) as an index for measuring PASCAL VOC and PASCAL-Context, and f1-score, recall ratio, accuracy and accuracy as evaluation indexes of blood leukocyte data sets.

6. Ablation study

Table 1 ablation study is illustrated by the contribution of 1/8 labeled data to each loss term in the PASCAL VOC data set

The present invention consists of three loss terms. Therefore, the effectiveness of each loss term and its combination was explored. The ablation experiment is shown in table 1, wherein CE, sharp, AS, CC, FGRM respectively refer to cross entropy, sharpening, adaptive sharpening, category consistency, and feature map gradient regularization, and all three loss terms effectively improve performance, and performance is further improved after feature gradient map regularization is added.

7. Qualitative and quantitative comparison

Table 2 shows the results of the evaluation on the PASCAL VOC and PASCAL-Context data sets. The performance of the algorithm of the present invention improved by 2.4% to 7.7% over the baseline method (deep bv2) when using unlabeled samples for different data segmentations. The method of Hung et al and the s4GAN method are representative of semi-supervised image semantic segmentation methods in recent two years. Under the same experimental setup, the algorithm of the present invention achieved the best results on the PASCAL VOC dataset of 1/3, 1/8, and 1/20 labeled samples and the PASCAL-Context dataset of 1/3 and 1/8 labeled samples. Figure 3 shows the qualitative results of the PASCAL VOC data set using 1/8 labeled samples.

TABLE 2 comparison of the results of the partitioning of different methods on the PASCAVLOC and PASCAL-Context datasets

To further prove that the algorithm of the invention has good generality. The 1/10-labeled samples were tested for leukocyte data sets without data enhancement. The data in table 3 show that the algorithm of the present invention improves the F1 score by 2.23%, the recall rate by 1.67%, the accuracy rate by 2.46%, and the accuracy rate by 0.95% on the basis of the baseline (Unet) method. The comparison result with the most advanced semi-supervised medical semantic segmentation method at present shows that the algorithm of the invention achieves the best segmentation effect with the minimum cost. Fig. 4 shows the partial segmentation results of the leukocyte data set of the 1/10 labeled sample, and the algorithm of the present invention can effectively segment the cytoplasm for the leukocyte image with cytoplasm close to the background.

TABLE 3 comparison of leukocyte data set semi-supervised segmentation Performance Using 1/10-labeled samples

8. Spatial complexity comparison

TABLE 4 comparison of spatial complexity on PASCAL VOC data sets

As shown by the comparison of spatial complexity on the PASCAL VOC data set in table 4, the algorithm of the present invention adds only 1.16M of additional parameter compared to the parameter of the baseline (depeplav 2), whereas the Huang et al method and the Mittal et al method add 2.78M of additional parameter. Compared with the comparison algorithm, the algorithm of the invention has less than half of the additional parameters of other methods.

Reference documents:

[1]Chen S,Bortsova G,Juárez A G U,et al.Multi-task attention-based semi-supervised learning for medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer,Cham,2019:457-465.

[2]Berthelot D,Carlini N,Goodfellow I,et al.Mixmatch:A holistic approach to semi-supervised learning[J].arXiv preprint arXiv:1905.02249,2019.

[3]Sudhanshu Mittal,Maxim Tatarchenko,and Thomas Brox,“Semi-supervised semantic segmentation with high-and low-level consistency,”IEEE Transactions on Pattern Analysis and Machine Intelligence,2019.

[4]Wei Chih Hung,Yi Hsuan Tsai,Yan Ting Liou,Yen Yu Lin,and Ming Hsuan Yang,“Adversarial learning for semi-supervised semantic segmentation,”in 29th British Machine Vision Conference,BMVC 2018,2019.

[5]Liang-Chieh Chen,George Papandreou,Iasonas Kokkinos,Kevin Murphy,and Alan L Yuille,“Deeplab:Semantic im-age segmentation with deep convolutional nets,atrous convo-lution,and fully connected crfs,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.40,no.4,pp.834–848,2017.

[6]Olaf Ronneberger,Philipp Fischer,and Thomas Brox,“U-net:Convolutional networks for biomedical image segmentation,”in International Conference on Medical image computing and computer-assisted intervention.Springer,2015,pp.234–241.

[7]Shuai Chen,Gerda Bortsova,Antonio Garc′ia-Uceda Jua′rez,Gijs van Tulder,and Marleen de Bruijne,“Multi-task attention-based semi-supervised learning for medical image segmenta-tion,”in International Conference on Medical Image Comput-ing and Computer-Assisted Intervention.Springer,2019,pp.457–465.

[8]Lequan Yu,Shujun Wang,Xiaomeng Li,Chi-Wing Fu,and Pheng-Ann Heng,“Uncertainty-aware self-ensembling modelfor semi-supervised 3d left atrium segmentation,”in In-ternational Conference on Medical Image Computing and Computer-Assisted Intervention.Springer,2019,pp.605–613.。

the above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A semi-supervised image semantic segmentation method based on entropy minimization is characterized by firstly providing a feature gradient mapping regularization strategy FGMR, which uses gradient mapping of a low-layer feature map in an encoder to enhance the encoding capability of the encoder on a deep-layer feature map; then, a self-adaptive sharpening strategy is provided, and the decision boundary of the unmarked data is kept in a low-density area; and in order to further reduce the influence of noise, a low-confidence consistency strategy is provided to ensure the consistency of classification and segmentation.

2. The entropy minimization-based semi-supervised image semantic segmentation method according to claim 1, wherein the modification of the semi-supervised image semantic segmentation network structure by the method is as follows: assuming the input image size is H × W and the number of classes is C, the network output is changed to the mean value μ of the segmentation result_s∈R^H×W×CSum variance

L＝L_s+λL_u

wherein L is_sIs a supervisory loss term, L_uAn unsupervised loss term, λ is a hyper-parameter that adjusts the balance between the supervised loss term and the unsupervised loss term;

wherein the content of the first and second substances,

and

3. The entropy minimization-based semi-supervised image semantic segmentation method according to claim 2, wherein the feature gradient map regularization strategy FGMR is specifically realized by the following formula:

wherein the content of the first and second substances,

the training phase is set to not perform back propagation.

4. The entropy minimization-based semi-supervised image semantic segmentation method according to claim 2, wherein the adaptive sharpening strategy adaptive sharpen is specifically realized as follows:

first, a common sharpening strategy is defined as:

wherein the content of the first and second substances,

5. the entropy minimization-based semi-supervised image semantic segmentation method according to claim 2, wherein the class consistency policy class consistency concrete implementation formula is as follows:

6. An entropy minimization-based semi-supervised image semantic segmentation method according to claim 5, wherein the method is characterized in that