CN109919159A

CN109919159A - A kind of semantic segmentation optimization method and device for edge image

Info

Publication number: CN109919159A
Application number: CN201910059828.7A
Authority: CN
Inventors: 赵伟; 傅一; 王立豪; 秦红波; 王中正; 王海
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2019-06-21

Abstract

The present invention relates to a kind of semantic segmentation optimization methods for edge image, comprising: chooses image data；Utilize the training of described image data and authentication image semantic segmentation model and full condition of contact random field models；The semantic segmentation result of image is obtained using the described image semantic segmentation model after training；The super-pixel segmentation result of image edge information is obtained using super-pixel segmentation algorithm；Using semantic segmentation described in the super-pixel segmentation result optimizing as a result, forming the first optimum results；Optimize first optimum results using the full condition of contact random field models after training.Method proposed by the present invention, the high-level semantics information in image can be efficiently extracted, retain image edge information by super-pixel segmentation algorithm, existing parted pattern is improved to the semantic segmentation accuracy rate of image border by local edge optimization algorithm, it realizes flexible, it is compatible strong, there is stronger robustness.

Description

A kind of semantic segmentation optimization method and device for edge image

Technical field

The invention belongs to technical field of image processing, and in particular to a kind of semantic segmentation optimization method for edge image And device.

Background technique

With computer science system constantly improve and the continuous development of multimedia and Internet technology, as meter Important branch in calculation machine subject is also gradually incorporating each of modern society using Digital Image Processing as the computer vision of representative A corner.Image, semantic segmentation is one of basic problem important in computer vision, its target is each picture to image Vegetarian refreshments is classified, and divides an image into several regions, between region respectively it is independent without be overlapped and all have it is respective Visual meaningaaa, and their different visual tags are given, in favor of subsequent image analysis and visual analysis.From macroscopic perspective For, the semantic segmentation of image can be regarded as the pre-processing process of scene understanding task, and scene understanding is always to calculate The key problem of machine visual field, the demand with people to semantic information is obtained from the multimedia mediums such as image/video are more next More, image, semantic segmentation becomes ever more important.From microcosmic point, the target of semantic segmentation is to realize Pixel-level classification, Method is individually classified to each pixel, and obtained result is then the semantic label of entire image.From practical application For aspect, image, semantic segmentation is accomplished that the segmentation of target and the two aspect task of identification of target.

Since the middle and later periods in last century, researchers are just in the research for being dedicated to image, semantic segmentation, by half The accumulation in more centuries, scholars are directed to different scenes, propose numerous different semantic segmentation algorithms.Thresholding method is image One of most basic method in segmentation field, principle are the differences of the color or gray value according to pixel in image, are carried out to image Segmentation.But this method is general to semantic recognition effect, disadvantage also clearly, when the gray value of pixel in image be closer to or When color distinction is little, error probability is higher, and since the setting of threshold value in practical situations will receive noise and illumination Influence, it is desirable to obtain suitable threshold value be it is very difficult, lead to the narrow scope of application of algorithm.Method based on edge detection It is another kind of conventional segmentation methods, basic thought is using the feature inconsistency between region, present in detection image Then all points are connected into line according to set strategy, until constituting enclosed region by marginal point.When image border gray value Clearly, while in the presence of image is almost without noise, the effect of acquisition is relatively good for variation, but works as the edge complex time-division It is not satisfactory to cut effect.Therefore this kind of partitioning algorithm is suitble to the gray value transition of segmenting edge obvious, general image noise Lesser image.Interactive image segmentation is a kind of dividing method that (Graph partioning) thought is divided based on figure.It calculates Method needs are artificially given a clue to distinguish different classifications, common are two sorting algorithms, for example artificial frame of typical interactive mode Foreground target or the boundary in prospect background draw lines out, and algorithm can be using the information artificially added as constraint, automatically later Generate segmentation result.But interactive dividing method needs artificial intervention, as one can imagine, it is a small amount of that such mode is only suitable for processing Picture, if there is the image under large amount of complex scene, frequently artificial label is cumbersome and time consuming.The dividing method of cluster leads to The gray value of pixel in movement images is crossed, wherein will be divided into same class by the lesser pixel of difference, it is this to divide pixel polymerization again The method of class is known as cluster segmentation method.Although needing not be provided priori knowledge, feature extraction and identification are not needed, reduces semanteme The difficulty of segmentation, still, since clustering algorithm extremely relies on the selection of initial seed point, different initialization results be will cause point The greatest differences of result are cut, while being easier to the object that erroneous judgement color difference is close but belongs to a different category, generally speaking its semanteme point The accuracy rate cut is not high.

Thereafter, probability graph model (Probabilistic Graphical Models, PGM) progresses into researchers' The visual field, main includes generating model (Generative Models) and discrimination model (Discrimitive Models).Condition Representative of the random field models (Condition Random Fields, CRFs) as discrimination model is another different from generating The typical probability graph model of model, CRF model can indicate the relationship between observational variable, including color and positional relationship etc.. This model is undoubtedly successful, it already becomes one of current most widely used image, semantic parted pattern.Later grinds The persons of studying carefully also proposed many improved models, such as full condition of contact random field models on this basis, it can carry out language While justice segmentation, considers detailed information such as texture, global context and the smoothing prior of image bottom, be obviously improved The effect of semantic segmentation.

Traditional image processing algorithm is required to process each pixel, and flourishing with multimedia technology, The image touched in life is more and more clear, and resolution ratio is higher and higher, and traditional algorithm is in time complexity and space complexity Aspect all suffers from challenge.Algorithm process process is to a large amount of consumption of resource so that researchers have to find new solution party Method, therefore super-pixel segmentation is come into being, and by the way that the problem of Pixel-level is transferred to region class, can reduce operation time Meanwhile reducing EMS memory occupation.In general, super-pixel refers to the set of a kind of similar pixel point.In common image, surpass Pixel is often made of a part of some object, due to belonging to an object, so rudimentary in color, position and texture etc. Characteristic aspect is all more similar, is cut between block and block by natural edge fate, is not overlapped mutually.Due to super-pixel segmentation algorithm The computational complexity of total algorithm can be greatly reduced in high efficiency, the pre-treatment step for being used for subsequent image treatment process. And possess the Partial Feature of object due to dividing obtained super-pixel, figure can be obtained by combining with the detailed information of bottom The structural information of picture, at present super-pixel segmentation by more and more application fields for example image segmentation, target detection, image classification and Target identification etc. is used as committed step.

Traditional images semantic segmentation algorithm is mostly the rudimentary clue that is carried using image pixel itself as foundation, directly according to Certain strategy carrys out segmented image.Because being directly to be split pixel, do not need to do algorithm any parameter adjustment, so Traditional algorithm is all relatively simple, but goes for the auxiliary that acceptable segmentation result would have to depend on artificial information. Currently, some traditional algorithms are often used as image preprocessing or post-processing, it is used cooperatively with neural network.It is swept across in deep learning Under the big tide in the whole world, the researchers of computer vision field take advantage of a favourable situation and are, obtain repeatly in terms of image, semantic cutting techniques good Achievement, full convolutional network (Fully Convolutional Networks, FCN) model with milestone significance are to be born Under such overall situation.Although existing image, semantic cutting techniques have reached pretty good in integrally segmentation accuracy rate Level, but single target can not be usually accurately positioned in existing method.If having overlapping or other complexity between object Situation is blocked, and existing semantic segmentation algorithm can not be handled well it, generally showing the result is that more Adhesion between a object, edge cannot clearly recognize that object also can be mis-marked as other classifications.

Summary of the invention

In order to solve the above-mentioned problems in the prior art, the present invention provides a kind of semantemes for edge image point Cut optimization method and device.The technical problem to be solved in the present invention is achieved through the following technical solutions:

The embodiment of the invention provides a kind of semantic segmentation optimization methods for edge image, comprising:

Choose image data；

Utilize the training of described image data and authentication image semantic segmentation model and full condition of contact random field models；

The semantic segmentation result of image is obtained using the described image semantic segmentation model after training；

The super-pixel segmentation result of image edge information is obtained using super-pixel segmentation algorithm；

Using semantic segmentation described in the super-pixel segmentation result optimizing as a result, forming the first optimum results；

Optimize first optimum results using the full condition of contact random field models after training.

In one embodiment of the invention, the selection image data, comprising:

It chooses one of VOC data set and Cityscapes data set and is used as image data.

In one embodiment of the invention, described to utilize the training of described image data and authentication image semantic segmentation model With full condition of contact random field models, comprising:

Described image data are divided into training set and verifying collection；

Using the training set as input, pass through iteration supervised training described image semantic segmentation model and the full connection Conditional random field models；

It is described complete after the verifying to be collected to the described image semantic segmentation model and training as input, after verifying training Condition of contact random field models.

In one embodiment of the invention, described image semantic segmentation model is FCN-8s model.

In one embodiment of the invention, the super-pixel segmentation algorithm is SLIC super-pixel segmentation algorithm.

In one embodiment of the invention, described to utilize semantic segmentation knot described in the super-pixel segmentation result optimizing Fruit forms the first optimum results, comprising:

Semantic label distribution is carried out to the super-pixel segmentation result, forms feature tag；

The feature tag is optimized using local edge optimization algorithm, forms the first optimum results.

Another embodiment of the present invention provides a kind of semantic segmentation optimization device for edge image, and feature exists In, comprising:

Data decimation module, for choosing image data；

Training authentication module, for utilizing the training of described image data and authentication image semantic segmentation model and full connection strap Part random field models；

Semantic segmentation module, for obtaining the semantic segmentation knot of image using the described image semantic segmentation model after training Fruit；

Super-pixel segmentation module, for obtaining the super-pixel segmentation knot of image edge information using super-pixel segmentation algorithm Fruit；

First optimization module, for utilizing semantic segmentation described in the super-pixel segmentation result optimizing as a result, forming first Optimum results；

Second optimization module, for optimizing first optimization using the full condition of contact random field models after training As a result.

Compared with prior art, beneficial effects of the present invention:

1. the present invention is a kind of semantic segmentation optimization method for edge image, it is intended to optimize the segmentation knot of existing algorithm Fruit in specific implementation process, can use different super-pixel segmentation algorithms according to specific needs, realize that flexibly, compatibility is strong. Aspect of performance can not only efficiently extract the high-level semantics information in image, and can utilize image low-level information Accurate Segmentation Image border, while there is stronger robustness to propagated error.

2. the present invention retains the object edge in image using super-pixel, existing point is promoted by local edge optimization algorithm Model is cut to the semantic segmentation accuracy rate of image border.

3. the present invention constrains similar pixel in color and spatial position using full condition of contact random field, sufficiently Using relationship between the pixel of image, thus advanced optimize semantic segmentation as a result, image border is made to obtain more accurately dividing.

Detailed description of the invention

Fig. 1 is a kind of process signal of semantic segmentation optimization method for edge image provided in an embodiment of the present invention Figure；

Fig. 2 is a kind of implementation process frame of the semantic segmentation optimization method for edge image provided in an embodiment of the present invention Figure；

Fig. 3 is that local edge is excellent in a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention Change the implementation process block diagram of algorithm；

Fig. 4 is to be based on super-pixel in a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention Edge optimization effect picture and partial enlarged view；

Fig. 5 is that full connection is utilized in a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention The effect of optimization figure and partial enlarged view that the precise edge that condition random field is realized restores；

Fig. 6 is a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention and existing image Segmentation accuracy rate of the semantic segmentation method on VOC data set compares histogram；

Fig. 7 is a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention and existing image Segmentation accuracy rate of the semantic segmentation method on Cityscapes data set compares histogram.

Specific embodiment

Further detailed description is done to the present invention combined with specific embodiments below, but embodiments of the present invention are not limited to This.

Embodiment one

Referring to Figure 1 and Fig. 2, Fig. 1 are that a kind of semantic segmentation for edge image provided in an embodiment of the present invention optimizes The flow diagram of method, Fig. 2 are a kind of semantic segmentation optimization method for edge image provided in an embodiment of the present invention Implementation process block diagram.

Choose image data；

Utilize image data training and authentication image semantic segmentation model and full condition of contact random field models；

The semantic segmentation result of image is obtained using the image, semantic parted pattern after training；

Using super-pixel segmentation result optimizing semantic segmentation as a result, forming the first optimum results；

Optimize the first optimum results using the full condition of contact random field models after training.

Specifically, in the embodiment of the present invention, the data set conduct of suitable popularity is all had in different field using two Image data, and using the standard division mode of data set.It is VOC data set first, it substantially has become synthesis Assess the benchmark dataset of new semantic segmentation algorithm.VOC data set includes 21 semantic classes, wherein including 20 prospect classes Other and 1 background classification.1449 pictures that the VOC data set of standard is concentrated comprising 1464 pictures in training set, verifying With 1456 pictures in test set, wherein training set and verifying collection all include the true semantic label of Pixel-level, are respectively used to Training, verifying and test.Another data set is Cityscapes data set, it is a urban landscape data set, its side The semantic understanding for overweighting city streetscape is divided into 19 semantic classes.The Cityscapes data set of standard has 2975 pictures As training set, 500 pictures are as verifying collection, and 1525 pictures are as test set.

After choosing image data, need to be trained image, semantic parted pattern and full condition of contact random field models And verifying.

Image, semantic parted pattern uses FCN model in the present embodiment, and the coarse spy in image is extracted by FCN model Sign, different from traditional convolutional neural networks model, FCN model can input the image of arbitrary size and generate of corresponding size Output, obtains Pixel-level classification results.FCN model can be converted by existing convolutional neural networks, FCN used herein Model is converted by the VGG-16 in VGGNet series.

It is that the full articulamentum in former network is replaced with into convolution by the specific practice that VGG-16 network is converted to FCN model Layer, while retaining preceding five-layer structure.In entire characteristic extraction procedure, after the operation of the convolution sum pondization of successive ignition, obtain The resolution ratio of the Feature Mapping arrived is lower and lower.In order to which final output is reverted to image identical with input picture size, need Up-sampling operation is carried out to centre output.During specific implementation, relatively primitive input picture, the feature of final output The resolution ratio of mapping reduces 2,4,8,16 and 32 times respectively.Directly the rough features of the last layer output adopt on 32 times Sample can be obtained by FCN-32s's as a result, still due to amplification factor cause greatly very much FCN-32s output image lack it is many thin Section, so its result is not accurate enough.In order to improve accuracy, rear several layers of more details information is needed to be added to FCN- It is gone in 32s, by the way that more details information in conjunction with the output phase of FCN-32s, can further be obtained FCN-16s and FCN-8s Result.

Image data is divided into training set image and verifying collection image in the present embodiment, using training set image and its really Semantic label mainly exercises supervision training to FCN-8s model, in order to which the characteristics of image for arriving e-learning is more advanced abstract, this Example has specifically carried out 50 iteration supervised trainings, i.e., at the beginning of the model when model after the completion of preceding primary training is as training next time Initial value.Mode in the present embodiment using cross validation determines several hyper parameters of full condition of contact random field.This implementation first ω is set in example₂And σ_γTwo values, for the two parameters, their influences for nicety of grading are simultaneously little, more It is to influence flatness, initial value ω is set₂=1, σ_γ=1, but according to test result, ω is finally set in the present embodiment₂=3, ω₂=3.For ω₁、σ_αAnd σ_βThese three hyper parameters have been used in the present embodiment and a kind of have been searched for by coarse to fine optimal value Strategy.The present embodiment selects a small amount of picture to scan on training dataset, and the initial value of these three parameters is set as ω₁= 3, σ_α=30, σ_β=3.Initial search frequency range is set as ω₁∈ [3:6], σ_α∈ [30:10:100] and σ_β∈ [3:6], respectively indicates ω₁And σ_βIt is to search 6 from 3, is incremented by 1 every time；σ_αIt is to search 100 from 30, is incremented by 10 every time.After one wheel search, optimal Search is re-started in the range of where value, incremental stepping halves until finally searching for stopping, capable of guaranteeing this implementation in this way The condition random field parameters being arranged in example are optimized parameters.By search, three values used in the present embodiment are respectively ω₁= 5, σ_α=49, σ_β=3.

After completing training image semantic segmentation model and full condition of contact random field models, need according to the figure after training As semantic segmentation model, image, semantic segmentation result is obtained.

Specifically, in the FCN-8s model after inputting an image into training, the semantic segmentation result of image is obtained.One side Face, due to the difference of receptive field, after former secondary convolution operations, resolution ratio is relatively high, and pixel classifications are less accurate, but to every The contrast locating of a pixel is more accurate.On the other hand, in last convolution several times, resolution ratio is relatively low, and the positioning of pixel is not It is enough accurate, but the classification of pixel is more accurate.The receptive field of FCN-8s model is small, is suitble to experience details, the result of FCN-8s Closest to true semantic label.But the result of FCN-8s is still not sensitive enough for the details of image, therefore is called coarse As a result.

After obtaining the semantic segmentation result of coarse image, need using super-pixel segmentation algorithm, obtaining includes image side The super-pixel segmentation result of edge information.

This example treats segmentation figure in order to obtain the better super-pixel of edge compactness, using SLIC super-pixel segmentation algorithm As carrying out super-pixel segmentation, since the resolution sizes of image differ, so, during actual super-pixel segmentation, this reality For example according to the Different Dynamic adjusting parameter of image resolution ratio, adjustable strategies are to guarantee that the pixel quantity that each super-pixel includes is kept In the range of [200,500].For example, for the image of 500 × 500 sizes, this example sets 1000 for super-pixel quantity； For 1024 × 2048 resolution ratio, 6000 are set by super-pixel quantity；Specific step is as follows:

SLIC algorithm is substantially a kind of part K-means clustering algorithm.It is assumed that in image pixel total N_p, it is expected that Super-pixel number be N_s.In the plane of delineation, using pixel as basic unit, since Row row, horizontal direction and vertical N is equably chosen using S as step-length in direction_sA initial cluster center.Wherein Row is equal to the 1/2 of S, the calculation formula of step-length are as follows:

In order to improve the formation speed of super-pixel, in the local rectangular window that size is 2S × 2S, SLIC algorithm will be every A pixel is distributed to it apart from nearest cluster centre.Choose the spy of gray value g and location information (x, y) composition pixel Levy vector.Assuming that the feature vector of any pixel is f_i=[g_i,x_i,y_i] and any cluster centre feature vector be f_c= [g_c,x_c,y_c], then pixel p_iWith cluster centre p_cDistance D_sCalculation formula are as follows:

Wherein, d_gAnd d_xyRespectively Gray homogeneity and space length, S are step-length, and m is control super-pixel compactness and rule The parameter of degree.The super-pixel of the bigger generation of parameter m value is more regular, usual value range k '=500,1000,1500, 2000,2500}.Gray homogeneity d_gWith space length d_xyCalculation formula be respectively as follows:

Wherein, g_iAnd g_cIt is pixel p respectively_iWith cluster centre p_cGray value, x_iAnd x_cIt is pixel p respectively_iWith it is poly- Class center p_cIn the coordinate value of X-direction, y_iAnd y_cIt is pixel p respectively_iWith cluster centre p_cIn the coordinate value of Y direction.

The process of SLIC super-pixel segmentation algorithm are as follows:

1, in the plane of delineation, using pixel as basic unit, with S for step-length both vertically and horizontally, from the Row row pixel starts, and equably chooses N_sA cluster centre.

2, in order to avoid cluster centre is fallen at the edge pixel point or noise pixel point of image, in each cluster centre In Ns × Ns neighborhood, the gradient value of each pixel is calculated, chooses the smallest pixel of gradient value as new cluster centre.

3, set iteration variable θ, and be initialized as 0, and in the search window of 2S × 2S, by pixel distribute to and its Apart from the smallest cluster centre, R cluster similar pixel point is obtained.

4, the characteristics of mean vector for calculating separately all pixels point in every cluster similar pixel point updates every cluster similar pixel point Cluster centre.

5, judge whether iteration variable θ is greater than iteration variable threshold value Ω, if then algorithm terminates and obtains N_sA super-pixel (every cluster similar pixel point is a super-pixel), otherwise iteration variable θ executes step 3 from increasing 1.

Empirical data suggests that the iteration 10 times cluster centre errors that can be realized twice in succession is only needed to be no more than 5%, because This, generally sets the number of iterations to 10 times.

Due to the edge of the edge fitting image of the super-pixel of generation, obtained super-pixel can be to the edge of image Information is described well.

After obtaining the super-pixel segmentation result comprising image edge information, need excellent according to above-mentioned super-pixel segmentation result Change coarse image semantic segmentation result.

The core concept of edge optimization algorithm proposed by the present invention is using the Pixel-level characteristic pattern of FCN output to super picture All pixels in element carry out semantic label distribution, there is the case where several classes are likely to occur, the following institute of pseudocode in this process Show.According to whether including that image border is divided into two kinds of situations inside super-pixel, that is, there is edge and do not have edge.Include in super-pixel In the case where edge, two kinds of situations can be divided into according to all pixels point semantic label whether having the same again.For description side Just, image border will not be included inside single super-pixel, and the semantic label note having the same of all pixels point in super-pixel For situation A, and image border will not be included in super-pixel, but there are the pixel in super-pixel a variety of semantic labels to be denoted as situation B.It will include image border in super-pixel, but still there is all pixels point identical semantic label to be denoted as situation C, if in super-pixel Comprising image border, and there are pixel multiple semantic labels to be then denoted as situation D, be divided in detail these types of situation below Analysis.

As shown in figure 3, the basic step of local edge optimization algorithm are as follows:

1, assume that input picture is I, rough features L；

2, K super-pixel, R={ R are obtained using SLIC super-pixel segmentation algorithm₁,R₂,...,R_K, wherein R_iIndicate subscript For the single super-pixel of i；

3, outer circulation: for i=1:K；

A, using M={ C₁,C₂,...,C_KIndicate R_iIn all pixels, wherein C_jIndicate the picture for being marked as classification j Element；

B, the feature of each pixel in C is obtained from front end, initializes weight W_CIt is 0；

C, interior circulation: for j=1:N；

By C_jFeature tag save asThen the weight of all labels in entire super-pixel is updated

WhereinIt indicatesIt is upper one value；

IfInterior circulation is then exited, is otherwise continued；

Terminate；

D, the whole W of search_CDetermine whether there is someValue be greater than 0.8.If it is present, then jumping in next step；It is no Then, maximum W is continually looked for_maxWith secondary big W_sub, then determine whether that the interpolation between them is greater than 0.2.If so, jumping to In next step；Otherwise continue outer circulation；

E, with the classification of maximum likelihood in current super-pixelAgain the semantic label of current super-pixel is marked；

Terminate；

4, the output result after being optimized

For situation A, this kind of super-pixel is generally in background or the body region of jobbie, it is also possible in image Clear and smooth marginal portion, since all pixels inside super-pixel impart identical semantic label by FCN model, So not needing to optimize it, original semantic marker is directly continued to use.For situation B, because in such super-pixel Also without image border, so be similar to situation A, super-pixel be likely to be in background or some object main region In, it is also possible near smooth edges.The up-sampling operation of FCN model may cause propagated error, for example may will scheme As adjacent edges background parts be labeled as other classifications, this is to lead to the main reason for situation B occur.For this super picture Element, since the pixel of mistake classification belongs to minority, so herein assigning the semantic label for accounting for maximum ratio in super-pixel again Whole pixels.

Situation C and situation D is that super-pixel is interior comprising image border, but the semantic label situation of pixel is different.In situation C All pixels semantic label having the same illustrates that FCN model has been identified as the body region of some object, in the region Itself has edge.For such super-pixel, way is still to continue to use existing semantic label, this is because FCN model is being taken out When as high-level semantics feature, to brightness, miniature deformation, block situations such as with certain robustness, internal edge can It can be the shade that illumination generates.Situation D is then the most complicated, and existing image border is assigned different again inside this kind of super-pixel Semantic label such as tends to occur at some small structures of image or blocks, covers at the regional areas.For this kind of situation, The present embodiment uses adaptive processing mode, if finding, the same semantic label has been assigned to 80% or more pixel, Then using the semantic label as the semantic label of entire super-pixel, the semantic label of most of pixels is occupied if it does not exist, then is put Abandon optimization to the region, this is because such case super-pixel can not effective district partial image edge, may if applying optimization It can run counter to desire.

Image, semantic point after super-pixel segmentation result optimizing coarse image semantic segmentation result, after will form optimization It cuts as a result, full condition of contact random field models is recycled to optimize the image, semantic segmentation result after optimization.

Specifically, after local edge optimizes, it is still necessary to improve under weak edge, small structure and complex scene Semantic segmentation precision.Therefore, the present invention more accurately restores the edge of image using full condition of contact random field models, i.e., The segmentation effect of marginal portion in image is advanced optimized, and then promotes whole image, semantic and divides accuracy rate.

According to the basic theory of condition random field, consider label using as unit of pixel as stochastic variable, by pixel it Between relationship as side, they just constitute a condition random field.These labels can after we obtain global observation To be modeled, and global observation is often readily available, and is usually exactly input picture.Saying more specifically, full condition of contact The input picture for possessing N number of pixel in random field has meant that global observation I.Then a figure G=(V, E) is given, V and E divide Not Biao Shi figure vertex and side.If X is by stochastic variable { X₁,X₂,...,X_NComposition vector, wherein X_iIt is stochastic variable, it The label of pixel i is distributed in expression.By input picture I and by the Pixel-level semantic segmentation figure of edge optimization, full connection is established Conditional random field models are indicated with probability distribution P (X):

Wherein, E (x) is label x ∈ L^NGibbs energy, Z (I) is segmentation function.Full condition of contact random field application Energy function:

Wherein, ψ_u(x_i) indicating unitary potential energy, it represents pixel i and is marked as label x_iProbability, in the present embodiment Unitary potential energy ψ_u(x_i) be optimization after semantic segmentation result；ψ_p(x_i,x_j) indicating binary potential energy, it represents pixel i and pixel j It is labeled as x simultaneously_iAnd x_jProbability, be shown below:

Wherein, I_iAnd I_jIndicate color vector, p_iAnd p_jThen indicate location of pixels；Hyper parameter σ_α, σ_βAnd σ_γControl Gaussian kernel The range of function；μ(x_i,x_j) it is label compatibility function, wherein if x_i≠x_j, then μ (x_i,x_j) it is equal to 1, otherwise μ (x_i,x_j) Value is 0, it means that the neighbouring similar pixel for being assigned with different labels will receive punishment, and in other words, similar pixel is roused Encourage the identical label of distribution, and " distance " widely different pixel is intended to be assigned different labels.For example, " road " and " vehicle " the two objects appear in the probability on a picture simultaneously should be much larger than " meadow " and " vehicle " while the probability occurred. Actual range between the definition and pixel color and pixel of " distance " is related, therefore full condition of contact random field can be as much as possible The segmented image at edge division.

This example determines several hyper parameters of full condition of contact random field using the mode of cross validation.This example is set first Set ω₂And σ_γTwo values, it is more to influence that for the two parameters, their influences for nicety of grading are simultaneously little Initial value ω is arranged in flatness₂=1, σ_γ=1, but according to test result, ω is finally arranged in this example₂=3, ω₂=3.It is right In ω₁、σ_αAnd σ_βThese three hyper parameters, this example used it is a kind of by coarse to fine optimal value search strategy.This example exists A small amount of picture is selected to scan on training dataset, the initial value of these three parameters is set as ω₁=3, σ_α=30, σ_β=3. Initial search frequency range is set as ω₁∈ [3:6], σ_α∈ [30:10:100] and σ_β∈ [3:6], respectively indicates ω₁And σ_βIt is to be searched from 3 Rope is incremented by 1 to 6 every time；σ_αIt is to search 100 from 30, is incremented by 10 every time.After one wheel search, in the range where optimal value Inside re-start search, incremental stepping halve can guarantee in this way until finally searching for stopping the condition of this example setting with Airport parameter is optimized parameter.By search, three values that this example uses are respectively ω₁=5, σ_α=49, σ_β=3.

Below in conjunction with l-G simulation test, effect of the invention is described further:

1, simulated conditions and content

The hardware simulation platform of the present embodiment are as follows: CPU is the memory of Intel's Intel Core [email protected], 8.0GB, Video card is NVIDIA Titan Xp, video memory 12GB.

Emulation 1 selects a secondary image to be split, is entered into the FCN-8s model after training, while using SLIC Super-pixel segmentation algorithm obtains the super-pixel segmentation of the image as a result, then using local edge Optimized model to the language of the image Adopted segmentation result carries out edge optimization, and specific effect of optimization is as shown in Figure 4；

Emulation 2 continues to use the condition after training to the segmentation result after preliminary edge optimizes obtained in emulation 1 Random field models are advanced optimized, and comparing result is as shown in Figure 5；

Emulation 3 divides accuracy rate based on classification, with the present invention with the well-known semantic segmentation method of existing two class to VOC number Accuracy rate comparative experiments is carried out according to the test image of concentration, as a result as shown in Figure 6；

Emulation 4 divides accuracy rate based on classification, with the present invention and the well-known semantic segmentation method pair of existing two class Test image in Cityscapes data set carries out accuracy rate comparative experiments, as a result as shown in Figure 7.

2, analysis of simulation result

Referring to Fig. 4, it can be seen that there is many clear, smooth and prominent edge super-pixel to have bonded object well The boundary of itself, the region of wherein most belong to situation A, i.e., for most pixel, have all continued to use FCN model The semantic label distributed to pixel.And the case where classifying for mistake B, situation C and situation D, it can be from the box mark in Fig. 4 Local magnification region out finds.The most common mistake is that background pixel is mistakenly classified as to other classifications in situation B, and from putting It can be seen that, optimization algorithm of the invention can effectively correct this kind of mistakes in big regional area.If can correct The mistake of situation C can undoubtedly enhance the accuracy rate of semantic segmentation algorithm, there is such a large amount of super-pixel in figure in boxed area, and Optimization algorithm of the invention is corrected the pixel of error label one by one.The super-pixel for belonging to situation D has different classifications Semantic information, and the pixel quantity to belong to a different category is roughly the same, these super-pixel are generally present in the weak edge of image Or in the complex environments such as small structure, it is easy to be classified by mistake.For this kind of super-pixel, the present invention does not divide again carelessly Class, but the segmentation result that selection is provided with front end is consistent.According to Fig. 4's as a result, four kinds of situations mentioned above have It is involved, it was demonstrated that edge optimization algorithm proposed by the present invention can carry out picture in super-pixel according to semantic label allocation strategy Plain semantic label is redistributed.

It may be seen that following phenomenon in 3 local details lived referring to Fig. 5, box circle:

(1) the semantic segmentation result for only passing through super-pixel edge optimization is perfect not enough, and there are also the skies further promoted Between, in addition the semantic segmentation result after CRF constraint has been closer to true semantic label.

(2) in large-scale picture structure, such as the part that train bottom is contacted with rail, object can not be accurately positioned in super-pixel Body edge, local edge optimization algorithm have continued to use the label of FCN model as a result, and passing through after the precise edge recovery of CRF, energy The pixel marked by FCN model errors is enough corrected, the pixel of mistake classification is classified as correct semantic classes again.

(3) for some small structures, such as the handrail and railing of train top and headstock, the amendment energy of super-pixel edge optimization Power is limited, and the restriction ability of condition random field can obtain good effect, and the image border after making optimization is more bonded really Object edge.

Referring to Fig. 6, it can be seen that, three kinds of algorithms have been more than 90% to the discrimination highest of background.In all categories In, it is minimum to this kind of other recognition accuracies of chair, only 20%~35%；Three kinds of models to bird, public transport, cat, motorcycle, The segmentation accuracy rate of the classifications such as people and train is in higher level.It can be seen that optimization algorithm proposed by the present invention is most of Maximum performance is all obtained in classification.Especially compared with FCN-8s, the IoU score under nearly all classification, which has, significantly to be mentioned The qualitative analysis of height, the above-mentioned segmentation result to the present invention and the existing semantic segmentation method based on full convolutional network shows this Invention can inherit the extractability of the excellent image high-level semantics information of FCN model well, simultaneously because the model is sufficiently sharp With location informations such as image borders, there is more accurate positioning to structures such as fine edge, fine cracks in image；

Referring to figure 6 and figure 7, it can be seen that the experimental result of Cityscapes data set and the result of VOC data set are still It is consistent.Using optimization algorithm proposed by the present invention, compared with FCN-8s, IoU achievement is significantly improved.Fig. 6 and Fig. 7 exist The quantitative contrast carried out in two datasets shows the algorithm proposed by the present invention using super-pixel progress edge optimization and makes The method for carrying out near edge recovery with condition random field makes semantic segmentation accuracy obtain effective promotion, this illustrates to utilize The low-level image information that traditional images partitioning algorithm obtains optimizes the coarse result of semantic segmentation, is to promote existing semanteme Partitioning algorithm divides a kind of effective method of accuracy rate.

Data decimation module, for choosing image data；

Training authentication module, complete for using image data training and authentication image semantic segmentation model and condition of contact with Airport model；

Semantic segmentation module, for obtaining the semantic segmentation result of image using the image, semantic parted pattern after training；

First optimization module, for utilizing super-pixel segmentation result optimizing semantic segmentation as a result, forming the first optimum results；

Second optimization module, for optimizing the first optimum results using the full condition of contact random field models after training.

Method proposed by the present invention, it is intended to optimize the segmentation result of existing algorithm, it, can be according to tool in specific implementation process Body needs to realize that flexibly, compatibility is strong using different super-pixel segmentation algorithms.Aspect of performance can not only efficiently extract figure High-level semantics information as in, and image low-level information Accurate Segmentation image border can be utilized, while having to propagated error Stronger robustness；Retain the object edge in image using super-pixel；Existing segmentation is promoted by local edge optimization algorithm Model is to the semantic segmentation accuracy rate of image border, using full condition of contact random field to similar picture in color and spatial position Element is constrained, and relationship between the pixel of image is made full use of, thus advanced optimize semantic segmentation as a result, obtaining image border To more accurate segmentation.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims

1. a kind of semantic segmentation optimization method for edge image characterized by comprising

Choose image data；

2. the method according to claim 1, wherein described image data, comprising:

VOC data set or Cityscapes data set.

3. the method according to claim 1, wherein described utilize the training of described image data and authentication image language Adopted parted pattern and full condition of contact random field models, comprising:

Described image data are divided into training set and verifying collection；

Using the training set as input, pass through iteration supervised training described image semantic segmentation model and the full condition of contact Random field models；

The full connection after the verifying to be collected to the described image semantic segmentation model and training as input, after verifying training Conditional random field models.

4. the method according to claim 1, wherein described image semantic segmentation model is FCN-8s model.

5. the method according to claim 1, wherein the super-pixel segmentation algorithm is the calculation of SLIC super-pixel segmentation Method.

6. the method according to claim 1, wherein described utilize super-pixel segmentation result optimizing institute predicate Adopted segmentation result forms the first optimum results, comprising:

7. a kind of semantic segmentation for edge image optimizes device characterized by comprising

Data decimation module, for choosing image data；

Training authentication module, complete for using the training of described image data and authentication image semantic segmentation model and condition of contact with Airport model；

Semantic segmentation module, for obtaining the semantic segmentation result of image using the described image semantic segmentation model after training；

Super-pixel segmentation module, for obtaining the super-pixel segmentation result of image edge information using super-pixel segmentation algorithm；

First optimization module, for being optimized using semantic segmentation described in the super-pixel segmentation result optimizing as a result, forming first As a result；

Second optimization module, for optimizing the first optimization knot using the full condition of contact random field models after training Fruit.