CN110322445A

CN110322445A - A kind of semantic segmentation method based on maximization prediction and impairment correlations function between label

Info

Publication number: CN110322445A
Application number: CN201910505928.8A
Authority: CN
Inventors: 赵帅; 蔡登�; 武伯熹
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2019-10-11
Anticipated expiration: 2039-06-12
Also published as: CN110322445B

Abstract

The invention discloses a kind of based on the semantic segmentation method for maximizing impairment correlations function between prediction and label, comprising: (1) real scene picture is inputted parted pattern, obtain predicted pictures；(2) sliding convolution is carried out in predicted pictures and label picture with a gaussian kernel function, obtain partial statistics characteristic；(3) according to obtained partial statistics characteristic, the linear dependence calculated in predicted pictures and label picture between corresponding region is strong and weak；(4) it using the index of linear dependence power as weight, adjusts the value of the intersection entropy loss of pixel in picture and carries out difficult sample excavation；(5) weight parameter in parted pattern is updated according to the penalty values of acquisition；(6) it steps be repeated alternatively until that training terminates, and carry out the application of semantic segmentation.Using the present invention, parted pattern can be made in the training process, more paying close attention to those leads to the point of low correlation between prediction and label, to promote the image segmentation of parted pattern.

Description

A kind of semantic segmentation based on maximization prediction and impairment correlations function between label Method

Technical field

The invention belongs to the image, semantics in computer vision to divide field, more particularly, to one kind based on maximization prediction The semantic segmentation method of impairment correlations function between label.

Background technique

Semantic segmentation is a basic problem of computer vision field, in unmanned, medical imaging analysis, geographical letter Scene is had a wide range of applications in the fields such as breath system, robot.In practice, the semantic segmentation of image is usually considered to be image More classification problems at midpoint, target are each pixels distributed to set semantic label in image.In recent years, with volume The development of product neural network and the proposition of the various parted patterns with stronger learning ability, semantic segmentation problem achieve very big Progress.Under normal circumstances, these models are trained and optimize by minimizing the average Classification Loss of pixel.Most Common semantic segmentation loss function is softmax cross entropy loss function:

Wherein, N is the number of pixel in picture, and C is the class number of object to be sorted, and y ∈ { 0,1 } is class label, The true classification of pixel is represent, p ∈ [0,1] is the probability of parted pattern prediction, and p is usually operated by softmax and provided.From Above formula can be seen that intersection entropy loss pixel-by-pixel for the point in image as mutually independent sample, and by all the points Total losses of the average cross entropy loss as model prediction result.However, point in image is there are very strong dependence, these The structural information of the under cover object of dependence between points.Since loss function pixel-by-pixel has ignored between points Relationship, when the visual signature of prospect is fainter or when pixel belongs to the object with smaller space structure, by pixel-by-pixel Loss function supervise trained semantic segmentation model segmentation effect it is usually not satisfactory.

In order to using the structural information of object included in image, the 26th neural information processing systems in 2012 into Open up the article " Efficient on conference Conference on Neural Information Processing Systems Inference in Fully Connected CRFs with Gaussian Edge Potentials " propose it is a kind of high The full condition of contact random field (Conditional Random Field, CRF) of effect come be fitted in image between points Relationship, and drive has the prediction result of the point of similar visual appearance more consistent in true picture.But in CRF quilt When as a post-processing step, it usually has time-consuming iteration reasoning process and changes to visual appearance sensitive.

In seminar Conference of the 30th neural information processing systems progress conference in 2016 about confrontation study On on Neural Information Processing Systems Workshop on Adversarial Training Article " Semantic Segmentation using Adversarial Networks " is proposed with confrontation learning network (GAN) Thought train parted pattern, judge the predicted pictures and label picture of parted pattern with an additional arbiter network Whether there is high-level structural integrity.However, GAN is generally difficult to train, and more memory is needed in the training stage Come while storing the generator network and arbiter network of deep layer.

In European Computer vision international conference European Conference on Computer in 2018 Article " Adaptive Affinity Fields for Semantic Segmentation " on Vision proposes one kind It is associated with neighborhood and loses (Affinity Field Loss) function, this loss function belongs to the neighbour of same category object to those The prediction for occupying point applies a convergent power, so that the prediction of these points tends to similar；To the neighbours for being not belonging to same type objects The prediction of point applies the power of a dispersion, so that the prediction of these points tends to dissimilar.It is possible thereby to increase the similar point of neighbours Prediction similarity and neighbours inhomogeneity point dissimilar degree, reach preferable segmentation effect.However this method is being counted When calculating the value of loss function, needs to save neighbours' point to matrix, generally require and be several times as much as calculating needed for loss function value originally Memory headroom.

Summary of the invention

Based on the deficiencies of the prior art, the invention proposes one kind based on maximization prediction and impairment correlations letter between label Several semantic segmentation methods, the correlation maximization between the predicted pictures for exporting parted pattern and label picture, to make two Person reaches higher structural similarity, improves the segmentation effect of parted pattern.

A kind of semantic segmentation method based on maximization prediction and impairment correlations function between label, comprising:

(1) real scene picture is inputted into parted pattern, obtains predicted pictures；

(2) sliding convolution is carried out in predicted pictures and label picture with a gaussian kernel function, it is special to obtain partial statistics Sign, mean value and variance including part；

(3) according to obtained partial statistics characteristic, the linear phase in predicted pictures and label picture between corresponding region is calculated Closing property is strong and weak；

(4) using the index of linear dependence power as weight, the value of intersection entropy loss of pixel in picture is adjusted simultaneously Difficult sample is carried out to excavate；

(5) the structural penalties function of difficult sample in each trained batch is calculated, and is further calculated for Optimized Segmentation The total losses function of model updates the weight parameter in parted pattern；

(6) repeat the above steps (1) to step (5), terminates training after reaching default frequency of training, and training is finished The application of model progress semantic segmentation.

For two pictures or picture block x and y, the common form of structural similarity index S SIM is as follows:

Wherein, 3 parts are the measurement of intensity of illumination similarity, the measurement of contrast similarity and structural similarity respectively Measurement, μ_x、σ_xAnd σ_xyIt is the covariance of the mean value of x, the variance of x and x and y, C respectively₁、C₂And C₃It is each for stablizing The positive number of component, their value are minimum.As constraint C₃=C₂When/2, and it can further obtain another reduced form of SSIM. As can be seen that the key that SSIM can measure picture structure similitude is its third part in from the equations above, and this Part is actually the Pearson correlation coefficient between variable x and y:

However, SSIM is not appropriate for being used directly to the loss function as semantic segmentation model, because in semantic segmentation In context, SSIM is not a convex function, therefore it is not easy to optimize, and model may can't converge on a part Minimum point.Based on the above analysis, the invention proposes the maximization predicted pictures and label picture phase that are suitable for semantic segmentation The structural penalties function of closing property.

In step (2), use a standard deviation for 1.5 gaussian kernel function w={ w_i| i=1,2 ..., k²(the value of weight 1 is normalized to,) estimate partial statistics characteristic:

Wherein, μ_yWithThe respectively local mean value and local variance of label picture, y_i∈ { 0,1 } is represented in label picture The value of pixel.The local mean value of predicted pictures and the calculation formula of variance are as above-mentioned formula.

Using this gaussian kernel function in the predicted pictures of parted pattern and label picture pixel-by-pixel carry out sliding volume Product, so that it may obtain the partial statistics characteristic of picture.The partial statistics characteristic obtained by means of which has isotropism, has Conducive to the further operating of subsequent step.

It is that (H, W and C are this label picture respectively to H × W × C label picture for a shape in step (3) High, wide and number of active lanes), it is considered as C bianry images.It is linear between predicted pictures and label picture for measuring based on this The index of correlation power are as follows:

Wherein, error e is the characterization of correlation power between two regional areas, and e is smaller, and correlation is stronger；μ_yAnd σ_yRespectively It is the local mean value and Local standard deviation of label picture, the corresponding pixel of label y is located at the center of this regional area, and p is The probability of parted pattern prediction, C₄=0.01 is a stable factor.Total error e between two regional areas can be used to measure The two interregional linearly related degree, total error e is smaller, is more likely to be positive correlation between two regional areas, This also means that the structure in the two regions is very likely consistent；Otherwise it is larger to work as error e, illustrates between two regions Structure is particularly likely that inconsistent.Therefore, error e can be considered as the measurement of the structural difference of two regional areas.

Because label picture y value range is { 0,1 }, it means that y²=y brings this result into variance calculation formula It can obtain,Thus can further it obtain:

Wherein, y^norIt is the value after the normalization of part.If we seek y^norAbout μ_yDerivative, we can be brighter Aobvious discovery, when other point values are 0 in y=1 and regional area, y^norObtain maximumWhen y=0 and regional area In other point values be 1 when, y^norObtain minimumThe distribution of predicted pictures p is often not so good as the distributed pole of label picture y End, the value p after normalization^norExtreme valueWithAbsolute value size, be respectively smaller than correspondingWithAbsolute value.

The case where statistical nature of image is spatially often unstable, often has mutation.In addition, global is equal Value and variance are invariable rotaries, and picture rotation its mean value of front and back and variance can't generate variation, this is for measuring It is unsatisfactory for the structural similarity of two pictures.Therefore, in order to preferably capture the local detail of image, the present invention is used Partial statistics characteristic rather than global statistics features.

In step (4), adjusts the value of the intersection entropy loss of pixel in picture and carry out used in difficult sample excavation Formula is as follows:

f_n,c=1 { e_n,c>βe_max},

Wherein, n and c represents coordinate of the current pixel point in picture, e_maxIt is the theoretical maximum of error e；When internal item When part is true, 1 { } was equal to 1, otherwise was 0；β ∈ [0,1) it is weight factor for selecting to want abandoned sample, y_n,cWith p_n,cIt is the corresponding label of current pixel point and prediction probability respectively,It is conventional sigmoid cross entropy loss function,It is the structural penalties function that can maximize correlation between prediction and label.In practice, the value of β is arranged to 0.1, this is One empirical value.The value that error e is adjusted to the conventional intersection entropy loss of pixel in image as weight is to allow point Model is cut in training, will more focus on those may result in predicted pictures and inconsistent pre- of label picture In survey, enhance the consistency of predicted pictures and label picture.Herein, we have still continued to use cross entropy loss function, this is Because logarithmic function loss in some documents, by it is experimental be proved to be one be highly suitable for deep neural network classification The loss function of device.

While readjusting weight, loss function proposed by the invention has abandoned those with lower error amount Sample point.This is because in the training process, the image in a batch may include millions of or even tens million of sample point. In trained later stage, parted pattern can usually obtain a higher pixel precision value (for example, 96%) and a phase (mean intersection-over-union, mIoU) score (for example, 78%) is combined to lower average cross.It is this existing As the training effectiveness for showing that the easy sample classified has dominated the parted pattern for losing and making becomes low.Therefore, we will have There is the sample of smaller structure otherness e to be considered as simple sample, and abandon them during the training period, that is to say, that these simple samples And it is not involved in the calculating of last structural penalties functional value.Last result be exactly cause label picture y and predicted pictures p it Between generate the difficult sample (the biggish sample of structural difference e) of low linear dependence and further more paid close attention to.This is one It is referred to as difficult sample in a little documents and excavates (online hard example mining, OHEM) strategy.

There are also any it is worth noting that, loss function proposed by the invention is extracted local statistical nature as volume Outer supervision message.Therefore, loss function proposed by the invention is the loss function of a region-by-region, this with it is general by The loss function of pixel has difference substantially.And the model being trained using loss function proposed by the invention, instruction It also will be under the supervision of the statistical nature information in part when practicing.

In step (5), the structural penalties function of difficult sample in single batch are as follows:

Wherein,It is the number of difficult sample, when the pixel for being located at Picture Coordinate (n, c) When for difficult sample, f_n,cIt is 1, conversely, it is pixel number total in picture that its value, which is 0, N, C represents the class number of object. It adds up and the structural penalties functional value for each pixel that is averaged, total structural penalties functional value of current training batch can be obtained. Due to when calculating structural difference, the otherness of the two-value picture in each channel of label picture predicted pictures corresponding with its is Independently calculate.It is mutually independent that this means that the two-value picture in different channels is considered to be, the point nature in different two-value pictures Also be independent from each other, thus calculate structural penalties function value when, we selected sigmoid operate rather than Softmax operation, when choosing and the number of dyscalculia sample, and the model of all sample points in entire two-value picture Enclose interior progress.

Finally, being used for the total losses function of Optimized Segmentation model are as follows:

Wherein, λ ∈ [0,1] is a weight factor, for adjusting conventional intersection entropy lossIt is damaged with structural similarity It losesRelative importance, the value of λ is set as 0.5 in practice.Conventional intersection entropy loss can measure predicted pictures and mark The similitude of image pixel intensities between label picture, and the structure that structural similarity loss can be measured between predicted pictures and label picture is similar Property.In above formula, the role of intersection entropy loss pixel-by-pixel is similar to the part that intensity of illumination similarity is measured in SSIM, the present invention The role of the structural similarity loss proposed is similar to the part that structural similarity is measured in SSIM.It is worth noting that, this Place intersects entropy loss using sigmoid.It means that semantic segmentation problem is in the present invention, unlike most of common Method it is the same, be considered as more classification problems of pixel in an image to consider, and be regarded as the two of multiple pixels Classification problem, then a multi-categorizer is combined by multiple two classifiers.

Compared with prior art, the invention has the following advantages:

1, the present invention proposes structural penalties function, provides a kind of very intuitive method to measure between two images Structural similarity；It can relatively easily be realized with the mode of convolution, and only need less additional meter during the training period Calculate resource.Therefore method proposed by the present invention can be easily integrated in any existing segmentation framework.

2, semantic segmentation method proposed by the present invention, parted pattern are easy to trained, do not need additional inference step or Additional network structure；By the way that experimental results demonstrate can obtain being better than base using the parted pattern of proposition method of the present invention training The performance of quasi- algorithm and some other congenic methods.

Detailed description of the invention

Fig. 1 is general frame and flow diagram of the invention；

Fig. 2 is label picture in the embodiment of the present invention and the schematic diagram after predicted pictures normalization；

Fig. 3 is the statistical value schematic diagram of the label picture and predicted pictures after normalizing in the embodiment of the present invention；Fig. 4 is instruction Difficult sample accounts for the schematic diagram of population sample number during practicing；

Fig. 5 is the embodiment of the present invention in the upper qualitative segmentation result of the verifying collection of PASCAL VOC 2012.

Specific embodiment

The invention will be described in further detail with reference to the accompanying drawings and examples, it should be pointed out that reality as described below It applies example to be intended to convenient for the understanding of the present invention, and does not play any restriction effect to it.

As shown in Figure 1, it is a kind of based on the semantic segmentation method for maximizing impairment correlations function between prediction and label, it is obtaining After the predicted pictures exported to parted pattern, predicted pictures and label picture are subjected to local normalization, are then calculated The power of predicted pictures and label picture correlation obtains the value of structural difference, and according to the value of structural difference to original Intersection entropy loss carry out weight adjustment, while carrying out difficult sample excavation.Then mould is updated according to obtained loss function value Shape parameter repeats these processes until training stops.It can be obtained by the preferable image, semantic parted pattern of a performance at this time.

As shown in Fig. 2, the variation of original predicted pictures and label picture normalization front and back pixel point value is illustrated, and Thus the variation of bring cross entropy penalty values.Before normalization, between original predicted pictures and label picture Sigmoid cross entropy penalty values are about 2.805 or so.Wherein, central point is accounted for about by misclassification, the cross entropy penalty values of central point 57% or so of total cross entropy penalty values.And after normalization and with the values of structure similar differences to original cross entropy penalty values After carrying out weight adjustment, the sigmoid cross entropy penalty values between the predicted pictures and label picture after normalization are about 3.060 left and right.Wherein, 91% or so of total cross entropy penalty values are accounted for about by the cross entropy penalty values of the central point of misclassification. It can thus be seen that the loss of inconsistent point is amplified between two regional areas, parted pattern after being normalized It will more be punished when generating inconsistent future position, thus parted pattern will be guided toward a better local convergence Point advances.

It in Fig. 3, has recorded in a training process, the pole of the predicted pictures after label picture and normalization after normalization The maximum value of value, minimum, mean value, intermediate value and structural difference, the Gaussian kernel size of the gaussian kernel function used are greatly 3.From figure 3, it can be seen that for normalized predicted pictures p^nor,WithValue be respectively smaller than it is corresponding Normalized label picture y^nor'sWithValue.And the maximum value e of structural difference_maxValue it is obviously big InOrValue.The mean value e of structural difference_meanWith intermediate value e_medianAll close to 0, synchronization, e_meanValue It is greater than e_medianValue.

In order to further analyze the strategy that the difficult sample taken in structural penalties function proposed by the invention excavates Influence have recorded when the threshold parameter β for choosing difficult sample takes different value its corresponding difficult sample number in Fig. 4 Mesh accounts for the variation of the ratio and this ratio of population sample number in a training process.

As shown in figure 4, the ratio of difficult sample is very sensitive, the change of β for the threshold parameter β for choosing difficult sample It is dynamic the ratio of difficult sample to be produced bigger effect, thus its selection be it is more crucial, the β numerical value used in the present invention for 0.1。

In Fig. 5, the segmentation effect of the parted pattern using inventive algorithm and using conventional method training is illustrated.It can be with , it is evident that the segmentation result of the parted pattern using inventive algorithm training, the segmentation mould relative to conventional method training The segmentation result of type obtains biggish promotion in visual experience effect.This qualitatively demonstrates the effective of inventive algorithm Property.

Method proposed by the present invention is applied in concrete instance below, while being carried out pair with the method for other same types Than to embody technical effect and superiority of the invention.

Parted pattern of the present invention is DeepLabv3 the and DeepLabv3+ semantic segmentation model in current forward position, When the present invention will be compared using method proposed by the present invention and using conventional intersection entropy loss, the performance of parted pattern.

The present invention tests on two large size public data collection PASCAL VOC 2012 and Cityscapes. 2012 data set of PASCAL VOC is divided into three parts: training set, verifying collection and test set have 1464,1449 and 1456 respectively Picture.The present invention, using an enhancing data set of PASCAL VOC 2012, includes 10582 figures in training Piece.Cityscapes data set is a high-resolution data collection, and wherein the size of image is 2048 × 1046, training set, Verifying collection and test set separately include 2975,500 and 1525 pictures.

Judging quota used in the present invention is mean intersection-over-union (mIoU) score, that is, is predicted Divide the intersection of object and the ratio of union in picture and label picture.The present invention first tests on the verifying collection of PASCAL VOC 2012 The effect of algorithm is demonstrate,proved, the results are shown in Table 1.As shown in table 1, CE and BCE is that conventional softmax and sigmoid are handed over respectively Entropy loss, the Gaussian kernel size of gaussian kernel function are pitched, that is, the size of the regional area used.As can be seen from the table, it adopts There is better performance than traditional method with the parted pattern of algorithm proposed by the present invention training.This is also shown in table 1 Invent the relationship of the Gaussian kernel size of the algorithm effect and gaussian kernel function that propose.

In addition to this, the present invention equally compared proposed method and some same on the verifying collection of PASCAL VOC 2012 The performance of the method for type.Comparing result is as shown in table 2.

As shown in table 2, illustrate the promotion of the method relative datum algorithm (Base) based on GAN, there are also CRF method and The promotion of Affinity method relative datum algorithm (CE, BCE).Compared to these methods, algorithm proposed by the invention is presented The maximum promotion effect of relative datum algorithm.Further, since experimental setup is changed, the mIoU score in table 2 not with Table 1 is consistent.

Table 1

Table 2

Further, the present invention equally demonstrates the validity of proposed algorithm on Cityscapes verifying collection, as a result As shown in table 3.

Table 3

Technical solution of the present invention and beneficial effect is described in detail in embodiment described above, it should be understood that Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in spirit of the invention Any modification, supplementary, and equivalent replacement, should all be included in the protection scope of the present invention.

Claims

1. a kind of based on the semantic segmentation method for maximizing impairment correlations function between prediction and label characterized by comprising

(2) sliding convolution is carried out in predicted pictures and label picture with a gaussian kernel function, obtain partial statistics characteristic, packet Include local mean value and variance；

(3) according to obtained partial statistics characteristic, the linear dependence in predicted pictures and label picture between corresponding region is calculated It is strong and weak；

(4) using the index of linear dependence power as weight, the value of the intersection entropy loss of pixel in picture and progress are adjusted Difficult sample excavates；

(5) the structural penalties function of difficult sample in each trained batch is calculated, and is further calculated for Optimized Segmentation model Total losses function, update parted pattern in weight parameter；

(6) repeat the above steps (1) to step (5), terminates training, and the model that training is finished after reaching default frequency of training Carry out the application of semantic segmentation.

2. the semantic segmentation method according to claim 1 based on maximization prediction and impairment correlations function between label, It is characterized in that, in step (2), use standard deviation for 1.5 gaussian kernel function w={ w_i| i=1,2 ..., k²Obtain part Statistical nature, wherein the partial statistics characteristic of label picture is as follows:

Wherein,μ_yWithThe respectively local mean value and local variance of label picture, y_i∈ { 0,1 } represents mark Sign the value of pixel in picture.

3. the semantic segmentation method according to claim 1 based on maximization prediction and impairment correlations function between label, It is characterized in that, calculating the index of the linear dependence power in predicted pictures and label picture between corresponding region in step (3) Are as follows:

Wherein, error e is the characterization of correlation power between two regional areas, and e is smaller, and correlation is stronger；μ_yAnd σ_yIt is mark respectively The local mean value and Local standard deviation of picture are signed, the corresponding pixel of label y is located at the center of this regional area, μ_pAnd σ_pPoint Not Wei predicted pictures local mean value and Local standard deviation, p be parted pattern prediction probability, C₄=0.01 be a stabilization because Son.

4. the semantic segmentation method according to claim 1 based on maximization prediction and impairment correlations function between label, It is characterized in that, in step (4), adjusts in picture the value of the intersections entropy loss of pixel and carry out difficult sample and excavate to be used Formula it is as follows:

f_n,c=1 { e_n,c>βe_max},

Wherein, n and c represents coordinate of the current pixel point in picture, e_maxIt is the theoretical maximum of error e；When interior condition is When true, 1 { } was equal to 1, otherwise was 0；β ∈ [0,1) it is weight factor for selecting to want abandoned sample, y_n,cAnd p_n,cPoint It is not the corresponding label of current pixel point and prediction probability,It is conventional sigmoid cross entropy loss function,Being can To maximize the structural penalties function of correlation between prediction and label.

5. the semantic segmentation method according to claim 4 based on maximization prediction and impairment correlations function between label, It is characterized in that, the value of β is set as 0.1.

6. the semantic segmentation method according to claim 1 based on maximization prediction and impairment correlations function between label, It is characterized in that, in step (5), the formula of the structural penalties function of difficult sample in each trained batch are as follows:

Wherein,It is the number of difficult sample, when the pixel positioned at Picture Coordinate (n, c) is difficulty When sample, f_n,cIt is 1, conversely, its value is 0；N is pixel number total in picture, and C represents the class number of object；It adds up simultaneously Total structural penalties functional value of current training batch can be obtained in the structural penalties functional value of average each pixel.

7. the semantic segmentation method according to claim 6 based on maximization prediction and impairment correlations function between label, It is characterized in that, the formula of total losses function are as follows:

Wherein, y and p respectively represents predicted pictures and label picture, and λ ∈ [0,1] is a weight factor, conventional for adjusting Intersect entropy lossIt is lost with structural similarityRelative importance, conventional intersection entropy loss for measure prediction The similitude of image pixel intensities between picture and label picture, and structural similarity loss is for measuring between predicted pictures and label picture Structural similarity.

8. the semantic segmentation method according to claim 7 based on maximization prediction and impairment correlations function between label, It is characterized in that, the value of λ is set as 0.5.