CN104134217B - Video salient object segmentation method based on super voxel graph cut - Google Patents
Video salient object segmentation method based on super voxel graph cut Download PDFInfo
- Publication number
- CN104134217B CN104134217B CN201410366737.5A CN201410366737A CN104134217B CN 104134217 B CN104134217 B CN 104134217B CN 201410366737 A CN201410366737 A CN 201410366737A CN 104134217 B CN104134217 B CN 104134217B
- Authority
- CN
- China
- Prior art keywords
- frame
- segmentation
- notable
- super
- pixel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000011218 segmentation Effects 0.000 title claims abstract description 31
- 230000003068 static effect Effects 0.000 claims abstract description 49
- 230000004927 fusion Effects 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 5
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 2
- 230000000644 propagated effect Effects 0.000 claims description 2
- 230000007812 deficiency Effects 0.000 claims 1
- 238000001914 filtration Methods 0.000 claims 1
- 238000007689 inspection Methods 0.000 claims 1
- 239000000155 melt Substances 0.000 claims 1
- 238000005457 optimization Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 18
- 230000008859 change Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000012216 screening Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005381 potential energy Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000000746 body region Anatomy 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a segmentation method for a salient object in a video. The method includes the steps that first, a static saliency map is obtained by calculating static saliency of each frame in a video sequence through super pixels; second, light streams of an early frame and a later frame in the video sequence are calculated through super pixels, and a dynamic saliency map is obtained by calculating dynamic saliency of each frame; third, the static saliency map and the dynamic saliency map are fused to obtain a dynamic and static saliency map; fourth, an object similarity graph of each frame in the video sequence is calculated; fifth, time and space over-segmentation of the video sequence is calculated, and the static saliency value, the dynamic saliency vale and the object similarity value are respectively mapped to the time and space over-segmentation of the video; sixth, the segmentation energy function relevant to the saliency, the object similarity and the continuity are established, and the energy function is optimized through iterative graph cut on each video frame through the time and space over-segmentation on the super voxel level, binary segmentation is performed on each frame, and a salient foreground object is obtained.
Description
Technical field
The present invention relates to technical field of computer vision is and in particular to a kind of video obvious object being cut based on super voxel figure
Dividing method, the method be based on sound state is notable, the video obvious object dividing method of type objects and continuation.
Background technology
In video sequence, the segmentation of obvious object, as the basis of Video processing, has in the multiple fields of computer vision
It is widely applied, such as video frequency abstract, Human bodys' response, video frequency searching, object identification in video, video activity analysis etc..Right
A generality difficult problem for the segmentation of object in video sequence includes the motion of video camera, the motion of background and change, and prospect shows
Write motion and the deformation of object itself.In video sequence, the segmentation of obvious object can be divided mainly into non-automatic segmentation and automatic segmentation
Two big class.
Non-automatic segmentation:The method needs the participation of user, user be required to mark out manually video head frame or some
Obvious object in key frame is as initialization data, every with obtaining video sequence using the mode of area tracking or propagation afterwards
The obvious object segmentation of one frame.The shortcoming of the method is that manual mark is loaded down with trivial details and time-consuming, therefore is unsuitable for the larger reality of data volume
Application.
Automatically split:The method has multiple implementations:1) method based on background subtraction:Mainly to background modeling and
Update, frame and background image are held the pixel region differing greatly during difference obtains.This method is transported than less suitable for background
The strong situation of dynamic acute variation.2) method based on cluster:As clustered using motion, trajectory clustering, space time information cluster etc., but
The method is unsuitable for the complicated situation of object displacement, situation about can move if any object therein.3) it is based on
The method of object motion, the method is typically first divided into, frame of video, the cluster that much may include object, then may at these
Comprise to process segmentation in the cluster of object, the complexity of the method can be higher.
Although segmentation has been research problem for many years, due to sharply increasing of video data, to VS
Automatically the demand of segmentation also increases therewith.And VS segmentation is inevitably in the face of background motion and change and prospect
The uncertainty of object itself compound movement and deformation and difficulty.It is therefore desirable to offer is a kind of is applied to what domestic consumer used
Low cost and the method for segmentation video obvious object convenient and that there is accuracy and practicality.
Content of the invention
In order to solve problem of the prior art, it is an object of the invention to provide one kind is based on " figure cuts (Graph cut) "
Video obvious object dividing method.
In order to reach described purpose, present invention utilizes the outward appearance of object, motion, type objects and apersistence information
Information structuring energy equation, decreases the interference of movement background, and utilizes image over-segmentation super-pixel and video space-time mistake
Split super voxel to reduce the complexity of algorithm.
The video obvious object dividing method being cut according to super voxel figure proposed by the present invention, including step:
Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes:Step
101, over-segmentation is carried out to this frame and obtains super-pixel;Step 102, to calculate static state significantly by the contrast and distribution of color characteristic
Property figure;Step 1033, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously;Step 104:Merge static aobvious
Write figure and dynamic notable figure, obtain sound state notable figure;Step 105:Calculate the type objects of the first frame, calculate potentially each
The ROI candidate region of individual object;Step 106, sound state notable figure and object ROI are merged, and filter unnecessary ROI area
Domain;Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, is carried out with " figure cuts " of iteration
Segmentation obtains the estimation of obvious object;Step 2, is carried out to the obvious object of each frame in addition to the first frame for the video sequence point
Cut, this step further includes:Step 201:The estimation region of former frame is traveled to next frame as priori;Step 202:Right
This frame uses step 101, and 102,103,105 are calculated various required middle level features values;Step 203:Calculate video when
Empty over-segmentation, construction, with regard to outward appearance, moves, and the energy equation of type objects and continuation minimizes this energy with " figure cuts "
Equation obtains obvious object segmentation.
Beneficial effects of the present invention:The present invention with utilizing the contrast of color and light stream based on image over-segmentation super-pixel
And continuity respectively obtains static and dynamic notable figure, the use of super-pixel reduces the complexity of algorithm, and not only considers
The method that Characteristic Contrast is also contemplated for being distributed also reduces the interference of some objects close with foreground color in background.Type objects
Calculating further increased the foundation of segmentation, improve accuracy.And it is based on the super voxel of video space-time over-segmentation
" figure cuts " method is used also to reduce further Space-time Complexity for unit, " figure cuts " is linear complexity in itself in addition,
Such utilization makes the calculating cost of algorithm reduce, the equipment of practical costliness that need not be professional.Non-automatic with traditional
The difference of VS dividing method be, the present invention, without the manual mark of professional, enables more high-quality simultaneously
Obvious object segmentation in the video sequence of amount.
Brief description
The flow chart of the video obvious object dividing method that Fig. 1 is cut based on super voxel figure for the present invention;
Fig. 2A is the original image of frame of video single frames;
Fig. 2 B is the over-segmentation of frame of video single frames, i.e. the schematic diagram of super-pixel;
Fig. 3 is the schematic diagram of the static conspicuousness of frame of video;
Fig. 4 is the schematic diagram of the dynamic conspicuousness of frame of video;
Fig. 5 is the schematic diagram of the type objects of frame of video;
Fig. 6 is the schematic diagram of the Pixel-level type objects of frame of video;
Fig. 7 is the schematic diagram of the sound state conspicuousness of fusion of frame of video;
Fig. 8 is the significantly super voxel result schematic diagram of video;
The result schematic diagram that Fig. 9 is split for video obvious object;
Figure 10 is the result schematic diagram that dynamic and static notable figure merges, and is from left to right followed successively by frame of video artwork, dynamically notable
Property figure, static Saliency maps, merge the sound state notable figure obtaining:
Figure 11 is segmentation result figure, and what Far Left lines were irised out is cut zone, is then sound state from left to right respectively
Conspicuousness merges figure, dynamic notable figure, static notable figure, type objects figure.
Specific embodiment
The present invention will be described in detail below it is noted that described embodiment is intended merely to facilitate to this
Bright understanding, and any restriction effect is not risen to it.
The present invention is to notable in video sequence based on sound state conspicuousness, type objects and continuation using " figure cuts "
The method that object is split.The method is divided into two stages, the processing stage to first frame and the dividing processing to each frame.The
One stage be to video head frame pretreatment obtain first frame obvious object region estimate because due to the limitation in first frame timing
Property and its propagate on importance, therefore first frame has been carried out pretreatment to expect to reach more accurate result;Second-order
Section be frame of video is processed one by one and obtains each frame obvious object segmentation, the step for be core procedure, wherein energy equation
Design around object outward appearance, motion, type objects and continuation are intended to reduce background change and object self-deformation and fortune
The dynamic impact waiting interference.
The method according to the invention, first passes through the obvious object region estimation that pretreatment obtains first frame, then using super
Pixel calculates the static conspicuousness of each frame in video sequence, obtains static notable figure;Calculated in video sequence using super-pixel
The light stream of two frames before and after every, calculates the dynamic conspicuousness of each frame, obtains dynamic notable figure;To static notable figure and dynamically notable
Property carries out fusion treatment and obtains sound state notable figure;Calculate the type objects figure of each frame in video sequence;Calculate video sequence
The super voxel of space-time over-segmentation, and the static saliency value of pixel scale, dynamic saliency value and type objects value are reflected respectively
It is mapped in the space-time over-segmentation of video;Set up with regard to conspicuousness, the segmentation energy function of type objects and continuation, in space-time
Over-segmentation rank to optimize this energy function to each frame of video using " figure cuts " and to carry out binary segmentation to each frame, obtains notable
Foreground object.
Fig. 1 is the video obvious object dividing method being cut based on super voxel figure of the present invention.
According to the video obvious object dividing method of the present invention, comprise the steps of:
Step 1, first, carries out over-segmentation to each two field picture in video sequence using K-MEANS algorithm and obtains super picture
Element.Super-pixel schematic diagram is as shown in Figure 2.
In this step, the lab color based on each two field picture and position coordinates x, the 5 dimension information of y, to having Similar color
And more neighbouring pixel clustered, obtain the over-segmentation of single-frame images, wherein lab value refers to 3 dimensions of lab color space
Degree, x, y are the transverse and longitudinal coordinate of pixel;The approximate over-segmentation of the color similarity obtaining and space.Fig. 2 is the schematic diagram of over-segmentation.
Because overdivided region remains the effective information carrying out image segmentation further mostly, and typically will not destroy objects in images
Boundary information, can directly come to carry out process to image in super-pixel to reduce calculating cost.
Step 2, calculates the static notable figure of frame of video and dynamic notable figure.
In this step, described static state notable figure and dynamic notable figure are required for first calculating the Saliency maps of center surrounding contrast
And the Saliency maps that distribution is compact.Static notable figure first precalculated be the Saliency maps of color contrast and colour consistency divides
The Saliency maps of cloth, and finally static notable figure is both fusions;Dynamic Saliency maps are also by calculating light stream value in the same manner
Contrast Saliency maps and the notable figure of motion continuity of light stream value merge and obtain.
Static color contrast is calculated as follows:
Wherein, N is the total number of frame of video over-segmentation;CsjFor the static color contrast of j-th super-pixel, the value model of j
Enclose and arrive N for 1;cjFor the Lab color average of j-th super-pixel, ckFor the Lab color average of k-th super-pixel, k takes from 1
To N;pjFor the position mean of all pixels in j-th super-pixel, the span of j is 1 to N, pkFor k-th super-pixel
All pixels position mean, k gets N from 1;w(pj, pk) it is coefficient with regard to position relationship, often both could be arranged to
Number 1 it can also be provided that with position relationship between super-pixel (distance) change weight, arrange herein this coefficient be Gauss power
Weight||cj-ck| | for cjWith ckDifference, the bigger Cs of its differencejStatic color
Contrast is bigger, and contrast is more big just to mean that this super-pixel is unique in terms of color.
The contrast equation of dynamic motion magnitude is as follows:
Wherein, CmjDynamic motion contrast for j-th super-pixel;pjIt is similarly all pixels in j-th super-pixel
Position mean, the span of j is 1 to N, and N is the total number of frame of video over-segmentation, pkFor k-th super-pixel all
The position mean of pixel, k gets N from 1;w(pj, pk) both could be arranged to constant 1 for coefficient it can also be provided that with framing bits
Put the weight of relation (distance) and change, this coefficient is again arranged to Gauss weight herein;HfjLight for j-th super-pixel
Flow stage histogram, the span of j arrives N, Hf for 1kFor the light stream magnitude histogram of k-th super-pixel, k gets N from 1, this
Light stream magnitude histogram depth involved by civilian algorithm is 2, and that is, ground floor is the light stream magnitude histogram in abscissa direction, the
Two layers be ordinate direction light stream magnitude histogram, it is motion size that the setting of such histogram not only allows for light stream magnitude
Distribution, also consider the direction of motion to a certain extent simultaneously;D(Hfj, Hfk) it is light stream magnitude histogram HfjWith HfkCard
Square distance, due to card side's distance span be 0 arrive just infinite, here thus make use of negative exponential function by 0 arrive just infinite
Card side's distance mapping 0 to 1, in order to calculate, thus, light stream magnitude histogram HfjWith HfkCard side distance bigger, CmjDynamic
State motion contrast is also bigger, and contrast means that more greatly this super-pixel is unique in terms of exercise intensity.
The computing formula of the static compact change degree of distribution is as follows:
Wherein, DsjFor the compact change degree of static distribution of j-th super-pixel, the change spatially of j-th super-pixel is got over
Low, DsjThen more low this super-pixel i.e. is spatially more compact for value;w(cj, ck) be with regard to super-pixel between color similarity coefficient,
It both can be set for constant 1 it is also possible to arranging it is the weight changing with super-pixel color similarity, this coefficient had been set herein
For Gauss weightpkFor k-th super-pixel all pixels position average
Value;N is the total number of frame of video over-segmentation;AndμcjRepresent and j-th super-pixel
There is the position mean of the super-pixel of Similar color.
The computing formula of dynamic motion consecutive variations degree is as follows:
Wherein, DmjDynamic motion consecutive variations degree for j-th super-pixel;w(Hfj, Hfk) be with regard to super-pixel between amount of exercise
The coefficient of level histogram similarity, HfjFor the light stream magnitude histogram of j-th super-pixel,
HfkFor the light stream magnitude histogram of k-th super-pixel, D (Hfj, Hfk) it is light stream magnitude histogram HfjWith HfkBetween card side away from
From light stream magnitude histogram HfjWith HfkMore dissimilar, w (Hfj, Hfk) value bigger;Andμmj
Represent and have and HfjThe mean value of the position of the histogrammic over-segmentation of similar light stream magnitude, wherein pkInstitute for k-th super-pixel
There is the mean value of location of pixels.
Static notable figure Ss is merged by static color contrast Cs and static distribution compactness Ds, and fusion formula is:
Wherein, SsjFor the static conspicuousness of j-th super-pixel, CsjFor the static color contrast of j-th super-pixel, DsjFor
The compact change degree of static distribution of j-th super-pixel;CsjBigger and DsjLess, then SsjValue is bigger.
Dynamic notable figure Sm is merged by dynamic motion contrast Cm and dynamic motion continuation degree Dm, and fusion formula is:
Wherein, SmjFor the dynamic conspicuousness of j-th super-pixel, CmjFor the dynamic motion contrast of j-th super-pixel, DmjFor
The dynamic motion consecutive variations degree of j-th super-pixel;CmjBigger and DmjLess, then SmjValue is bigger.
The schematic diagram of the static conspicuousness of frame of video as shown in figure 3, but the schematic diagram of the dynamic conspicuousness of frame of video such as
Shown in Fig. 4.
Step 3, the fusion of the static notable figure of execution and dynamic notable figure.
The strategy taken in this step is that static Saliency maps Ss is complemented one another with dynamic Saliency maps Sm, due to the mankind
Notice is easier passive movement and is attracted, therefore the region with very high motion conspicuousness retains, and shows without very high motion
The region of work property is likely to the noise that optical flow algorithm or background motion bring, and they need to be combined to examine with static notable figure
Amount, fusion formula is as follows:
Wherein, SaljDynamic and static conspicuousness for j-th super-pixel merges obtained sound state saliency value, SsjFor jth
The static significance value of individual super-pixel, SmjDynamic significance value for j-th super-pixel.And Ts is to arrange to obtain very high threshold value,
Why herein Ts is carried out with the setting of very high threshold value, Ts is set to 0.8 herein, be that to consider motion preferential former first
Then, retain the region that those have high motion conspicuousness;Secondly, it is in order that those have ambiguous motion conspicuousness
The region of value can obtain the correction of static conspicuousness, reduces the impact that light flow noise and camera lens movement are brought;?
Afterwards, the impact increasing motion conspicuousness in the case of motion conspicuousness very little makes the right of the obvious object in its suppression background
The interference of prospect obvious object.
Fig. 7 is that the static conspicuousness of the dynamic notable figure of frame of video merges the sound state schematic diagram obtaining.
Step 4, calculates the type objects of frame of video.
In this step, the result of calculation of first frame type objects can be slightly different, except each frame obtaining of will calculating
The type objects figure of Pixel-level, the ROI region to similar object candidate that video sequence also wants to, input here is except bag
Include and obtain color contrast before this and super-pixel also inputs the boundary information obtaining using the detection of Canny operator.These three inputs are all
Closely bound up with object, wherein color contrast represents the contrast of foreground object color and background;And each mistake of super-pixel
Segmentation all represents the color that maintain boundary information with controlling region, and therefore one over-segmentation belongs to the possibility pole of same object
Greatly;In addition, border is similarly the important attribute of object.Then obtained using the type objects detector based on Bayesian model
Final candidate's ROI region Ro and its type objects value O that potentially include object, and the probability that intermediate result obtains then output pixel
The type objects figure of level.
Fig. 5 is the ROI schematic diagram of the type objects of frame of video, and Fig. 6 is the type objects of pixel scale of frame of video
Schematic diagram.
Step 5, the screening of type objects candidate's ROI region.
In this step, first, sound state notable figure be processed, the region with 0.5 as threshold value, more than or equal to 0.5
Retain, other give up, obtain conspicuousness be more than 0.5 notable figure Rh, for convenience after operation, need the threshold obtaining
Value notable figure binaryzation.The present invention uses, using unrestrained water filling algorithm, image link field is filled to 1, after by remaining area
Domain is set to 0.After obtaining the threshold value notable figure of binaryzation, it is carried out morphologic open operation, that is, at expansion after first burn into
Reason, to remove the less bright areas of area, reduces the interference of noise.
Thereafter, for connected region RS, matching covers their ROI region:By horizontal and vertical scanning, the company of finding
Logical region RSUltra-left point, rightest point, peak and minimum point, ((xl, yl), (xr, yr), (xu, yu), (xd, yd)), wherein
xl, ylFor the transverse and longitudinal coordinate of high order end point, xr, yrFor the transverse and longitudinal coordinate of low order end point, xu, yuTransverse and longitudinal for the point of the top is sat
Mark, xd, ydTransverse and longitudinal coordinate for the point of bottom.And its ROI region R of covering of matchingS4 apex coordinates (counterclockwise side
To) it is ((xl-0.05(xr-xl), yu), (xl-0.05(xr-xl), yd), (xr+0.05(xr-xl), yd), (xr+0.05(xr-xl),
yu)), all widen 5% in left and right here, also increased 5% up and down, wherein xl-0.05(xr-xl), yuROI square for matching
The transverse and longitudinal coordinate of shape region upper left end points, xl-0.05(xr-xl), ydFor the transverse and longitudinal coordinate of rectangle lower-left end points, xr+0.05(xr-
xl), ydFor the transverse and longitudinal coordinate of rectangle bottom right end points, xr+0.05(xr-xl), ydFor the transverse and longitudinal coordinate of upper right end points,.
Be made afterwards be exactly to include object candidate's ROI region Ro preliminary screening, first to calculate each may
The region area intersecting with Rs including candidate's ROI region Ro of object, and calculate its contrast with itself area, this ratio should be big
In threshold value To;Except considering RojCandidate region to saliency value and marking area occur simultaneously, and also a screening criteria is exactly, and wish
Hope that it can surround marking area as far as possible, and require here to calculate candidate RojThe area ratio of the region area intersecting with Rs and Rs
More than threshold value Ts, it is shown below:
R={ Roj|area(Roj∩Rs)÷area(Roj) > To ∧ area (Roj∩ Rs) ÷ area (Rs) > Ts } (8)
Wherein, the Ro that R is filtered out by above formulaiThe set in region, RoiRepresent i-th candidate ROI region area (Roi∩
Rs) represent candidate's ROI region RoiThe size in the region intersected with marking area Rs, area (Roi) represent candidate's ROI region
RoiSize, area (Rs) represents the size of candidate marking area Rs, To and Ta be all threshold value;The sieve of this step
Choosing is primarily to exclude some substantially non-compliant candidate regions, to reduce the calculating of the finer screening of next step
Amount.
Finally, to each of the R after screening candidate's ROI region, calculate and complete the super-pixel set In in its region
Saliency value distribution histogram Hin, and calculate round the super-pixel collection that In is outside its region or a part is outside its region
Close the saliency value Sal distribution histogram Hsu of Su, and calculate the super-pixel collection of the outmost turns that are in In set adjacent with Su set
Close the saliency value Sal distribution histogram Hbu of Bu;Calculate the contrast of Hin and Hsu afterwards, and the contrast of Hsu and Hbu, due to ROI
Interior super-pixel is bigger with the significance value difference in distribution bigger explanation possibility in its region for the object around super-pixel, and interior
Circle agrees with better with its region of the bigger explanation of saliency value difference in distribution around super-pixel with object boundary.Finally, this calculation
The ROI region selecting to have corresponding to maximum differential value Diff is final candidate's ROI region by method, the computing formula of Diff value
It is shown below:
Diffj=(1-e- D (Hsu, Hin))+α(1-e- D (Hsu, Hbu))2(9)
Wherein, DiffiRepresent the difference value of i-th candidate's ROI region;HiniRepresent super in i-th candidate's ROI region
The significance value distribution histogram of pixel set In;HsuiRepresent round the super-pixel set In in i-th candidate's ROI region
Super-pixel set Su significance value distribution histogram, these super-pixel are outer or a part is in area in i-th candidate's ROI region
Overseas;HbuiRepresent in i-th candidate's ROI region and gather direct neighbor with Su, that is, be in the super of " outmost turns " that In gathers
The significance value distribution histogram of pixel set Bu.Because the scope of card side's distance is 0 to arrive just infinite, and 1-e- D (Hsu, Hin)And 1-
e- D (Hsu, Hbu)Scope all 0 to 1, because the important ratio that body form irregular rectangle, therefore border agree with is relatively low,
So having carried out square processing and being multiplied by the factor alpha less than 1 to Section 2 contrast.Choose the Diff value in R with maximum
ROI candidate region is the final ROI region estimated.
Step 6, the obvious object segmentation work of first frame.
In order to split the obvious object obtaining first frame, to do is to below build energy equation:
E (X)=A (X)+O (X)+AC (X)+OC (X) (10)
Wherein, E (X) is the energy equation in units of super-pixel, and X is super-pixel set, and A (X) is object outward appearance
(appearance) unitary item, O (X) is type objects (objectness) unitary item, and AC (X) is color binary item, and OC (X) is
Type objects binary item.
A (X) is the unitary item with regard to object outward appearance (appearance), and first first frame will be clustered with two RGB color height
This mixed model (GMM), one of GMM be fusion obtained in the previous step sound state conspicuousness be more than 0.5 region RhAsk
Its gauss hybrid models FG, GMM are background models BG for remaining region.Because GMM can calculate according to data generally
Rate density, you can to do density estimation (density estimation), the therefore effect of GMM here is to extrapolate to give super picture
Element becomes the size of the probability of a prospect or background.If an over-segmentation and prospect are mated very much, but it is marked as background (mark
Background is 0, and prospect is 1) i.e. if 0, its penalty value will be very big:
Wherein, A (X) is object outward appearance unitary item,It is to super-pixel xiMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (xi∈ FG), p (xi∈ BG) it is respectively super-pixel xiBelong to
The probability of prospect FG, and its belong to the probability of background BG.
And O (X) is the unitary item with regard to type objects (objectness), regarding outside the ROI that finally gives depending on previous step
For background, and for possible object in ROI, in the same manner, calculate the GMM model of objectness, this type objects for it
(objectness) design of unitary item is similar with outward appearance (appearance) unitary item:
Wherein, O (X) is object outward appearance unitary item,It is to super-pixel xiMark (mark 0 be background, 1 be prospect),For potential-energy function (potential functions), p (xi∈ OBJ), p (xi∈ OBG) it is respectively super-pixel xi genus
In the probability of possible object OBJ, and its probability belonging to background OBG outside object.
And the setting of binary item is then the relation of concern over-segmentation and over-segmentation, it is not connect mutually between neighborhood over-segmentation
Continuous cost and punishment, if two neighborhood over-segmentation difference very littles, then what it belonged to same target or same background can
Energy property is just very big, if their difference is very big, that illustrates the edge likely in target and background for this two over-segmentations
Part, then the possibility ratio being partitioned from is larger, so when two neighborhood over-segmentation difference are bigger, energy is less.
It is binary item AC (X) of the discontinuous punishment of concern appearance color first, and its more big this discontinuous difference of distance
The impact bringing can weaken, and its formula is as follows:
Wherein, AC (X) is color binary item,For KijCoefficient,For 1,
It is the Euclidean distance between two super-pixel midpoints for 0, dist, dcor is the difference of the color average of over-segmentation, γ
It is coefficient with β.
In the same manner, binary item OC (X) of the discontinuous punishment of concern type objects is similar with color binary item, needs first here
Use the Pixel-level type objects figure that the 4th step is calculated, then the type objects value of Pixel-level is mapped one by one according to position
To in over-segmentation;Binary item OC (X) formula is as follows:
Wherein, OC (X) is type objects binary item,For KijCoefficient,For 1,
It is the Euclidean distance between two super-pixel midpoints for 0, dist, dobj is the difference of the type objects value of over-segmentation, γ and β
For coefficient.
After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node
Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split
?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling
The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, finally can get the obvious object segmentation of first frame.
Step 7, will be undue for space-time that the static conspicuousness of each frame, dynamic conspicuousness and type objects are mapped to video
Cut:
This step carries out the space-time over-segmentation of video initially with supervoxel method to video, obtains
Supervoxel, i.e. super voxel;Then static conspicuousness will be obtained through the 1st step and the 2nd step, dynamic conspicuousness, with and pixel
The type objects of level are mapped to supervoxel over-segmentation one by one according to position, and calculate being wrapped of each supervoxel respectively
The static conspicuousness of all pixels including, the mean value of dynamic conspicuousness and type objects is as the static state of this supervoxel
Significance value, dynamic conspicuousness and type objects value.
Fig. 8 is the significantly super voxel over-segmentation result schematic diagram of video.
Step 8, the obvious object segmentation of each frame.
In order to split the significant object obtaining each frame after first frame, below we are to be done is still structure energy side
Journey, but the energy equation of energy equation here and first frame is slightly different, its equation is as follows:
EF (V)=AF (V)+ACF (V)+OCF (V)+PCF (V) (21)
Wherein, EF (V) is the energy equation in units of supervoxel, and V gathers (i.e. super voxel collection for supervoxel
Close), AF (V) is object outward appearance (appearance) unitary item, and ACF (V) is color binary item, and OCF (V) is type objects binary
, PCF (V) is continuation binary item.
Wherein AF (V) is still the unitary item with regard to object outward appearance (appearance), A (X) in its definition and formula (10)
Define similar.The motion assuming initially that the obvious object between two frames is smooth and gentle, here with dynamically showing calculating
The light stream obtaining during work property, the obvious object obtaining that former frame is split calculates notable thing using the direction of light stream and speed
The displacement of each pixel in body region, and calculate its position in next frame.In order to accelerate the speed of algorithm, the node of figure
Unit is that the space-time of video clusters super voxel (supervoxel) rather than in Pixel-level operations, just includes all here
The set of the super voxel of space-time over-segmentation of the pixel that previous frame is propagated is as possible prospect obvious object, remaining region
For background.Two RGB color gauss hybrid models (GMM) are clustered respectively to this two regions, sets up foreground model FG and background
Model B G.Its formula is as follows:
Wherein, AF (X) is object outward appearance unitary item,It is super voxel v to space-time over-segmentationiMark (mark 0 is
Background, 1 is prospect),For potential-energy function, p (vi∈ FG), p (vi∈ BG) it is respectively viBelong to prospect FG probability and
It belongs to the probability of background BG.
And the setting of binary item ACF (V) is almost consistent with AC (X) setting in formula (10), it is not both the node of simply figure not
It is super-pixel (superpixel) again but super voxel (supervoxel), wherein dcor represents the super voxel of space-time over-segmentation
The color average of all pixels difference, its formula is as follows:
Wherein, ACF (V) appearance color binary item,For KijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation i.e. super voxel midpoint for 0, dist, dcor is two space-time mistakes
Split the difference of the color average of super voxel, γ and β is coefficient.
In the same manner, the setting of binary item OCF (V) is almost consistent with OC (X) setting in formula (10), here firstly the need of using the
The type objects value of Pixel-level is then mapped to over-segmentation according to position by Pixel-level type objects figure that 4 steps are calculated one by one
On, wherein dobj is the difference of the average type objects value of the super voxel in field, and its formula is as follows:
Wherein, OCF (V) is type objects binary item,For KijCoefficient,For 1,It is the Euclidean distance between space-time over-segmentation midpoint for 0, dist, dobj is the type objects value of super voxel
Difference, γ and β is coefficient.
As it is assumed that the obvious object in video is gentle in the fluent motion of interframe, that is, it has continuation
(Persistence), thus devise binary item PCF (V) with regard to continuation, pay close attention to the interframe of sequential over-segmentation continuous
Property.If by the over-segmentation mark of the corresponding previous frame of super voxel very high with previous frame continuity and that outward appearance is much like not
With the punishment that it is subject to is larger;Conversely, the super voxel that if two interframe continuities are very high in front and back and outward appearance is much like obtained identical
Mark, then the punishment that it is subject to is less.The continuity degree of two super voxels presses optical flow computation by pixel in the super voxel of previous frame
The sum of the number of pixels being displaced in the super voxel of next frame is divided by the ratio that obtains of sum of all pixels of this super voxel (the former)
Represent, this is represented by pers than row.Its formula is as follows:
Kij=γ pers (vi, v 'j)exp(-βdcor(vi, v 'j)2) (30)
Wherein, PCF (V) is continuation binary item, and in following formula, v represents the super voxel of present frame, and v ' represents present frame
Super voxel in former frame,For KijCoefficient,For 1,It is two super bodies for 0, dcor
The difference of the color average of element, γ and β is coefficient, pers (vi, v 'j) continuity degree of two super voxels of frame before and after calculating,
Space-time over-segmentation v ' by previous framejMiddle pixel is displaced to next frame over-segmentation v by optical flow computationiIn sum of all pixels divided by v 'jIn
The ratio that obtains of sum of all pixels represent.
After establishing this energy equation, t-link (connection of node and terminal node) and n-link are (between node
Connection) all establish, there has been the figure needed for " figure cuts ", it is possible to use " figure cuts " minimizes energy equation being split
?.It is used herein the thought of the iteration similar to Grab cut, each iterative process all makes to target and background modeling
The parameter of GMM is more excellent so that image segmentation is more excellent.Thus, can get the obvious object segmentation of each frame.Fig. 9 is in frame of video
The figure of obvious object cut result schematic diagram.
Particular embodiments described above, has carried out detailed further to the purpose of the present invention, technical scheme and beneficial effect
Describing in detail bright it should be understood that the foregoing is only the specific embodiment of the present invention, being not limited to the present invention, all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement done etc., should be included in the protection of the present invention
Within the scope of.
Claims (10)
1. the video obvious object dividing method that a kind of super voxel figure cuts, the method comprises the following steps:
Step 1, splits to the obvious object in the first frame in video sequence, and this step further includes:
Step 101, carries out over-segmentation to this frame and obtains super-pixel;Step 102, to be calculated by the contrast and distribution of color characteristic
Static Saliency maps;Step 103, to calculate dynamic Saliency maps by the contrast of the magnitude of light stream with continuously;Step 104, melts
Close static notable figure and dynamic notable figure, obtain sound state notable figure;Step 105, calculates the type objects of the first frame, calculates
The ROI candidate region of each object potential;Step 106, sound state notable figure and object ROI is merged, filtration need not
The ROI region wanted;Step 107, with ROI region and sound state conspicuousness for weak constraint, constructs energy equation, with iteration
" figure cuts " carries out splitting the estimation obtaining obvious object;
Step 2, splits to the obvious object of each frame in addition to the first frame for the video sequence, this step further includes:
Step 201, the estimation region of former frame is traveled to next frame as priori;Step 202:Step 101 is used to this frame,
102,103,104,105 are calculated various required middle level features values;Step 203, calculates the space-time over-segmentation of video, construction
With regard to the energy equation of outward appearance, motion, type objects and continuation, minimize this energy equation with " figure cuts " and obtain notable thing
Body is split.
2. the method for claim 1 is it is characterised in that described step 101 further includes:Based on each two field picture
Lab color and position x, the information of y clusters to having a Similar color and more neighbouring pixel, obtains single-frame images
Over-segmentation, i.e. super-pixel, wherein lab value refer to 3 dimensions of lab color space, and x, y are the transverse and longitudinal coordinate of pixel.
3. it is characterised in that step 102,103 further include the method for claim 1:Described static state notable figure and
Dynamic notable figure all needs first to calculate the Saliency maps of center surrounding contrast and is distributed compact Saliency maps, and static notable figure is first
Precalculated is the Saliency maps of color contrast and the Saliency maps of colour consistency distribution, and finally static notable figure is for both
Fusion;Dynamic Saliency maps are also aobvious by the motion continuity contrasting Saliency maps and light stream value calculating light stream value
The fusion of work figure obtains.
4. the method for claim 1 is it is characterised in that step 104 further includes:Analyze dynamic notable figure and static state
The respective advantage of notable figure and deficiency, adopt threshold value to control to merge static notable figure and dynamic notable figure with piecewise function, obtain
To sound state notable figure.
5. the method for claim 1 is it is characterised in that step 105 further includes:Using the inspection of type objects detector
Survey the ROI region whether this frame is object.
6. the method for claim 1 is it is characterised in that step 106 further includes:Using ROI region to sound state
The level of coverage of marking area, to filter out some type objects ROI candidate, screens the ROI region that may comprise obvious object.
7. the method for claim 1 is it is characterised in that in step 107, setting up and show with regard to ROI region and sound state
Write the energy equation of figure, optimizing using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation cost.
8. the method for claim 1 is it is characterised in that in step 201, the obvious object cut zone that former frame obtains
Displacement will be estimated based on the direction of motion of light stream and magnitude, travel to next frame.
9. the method for claim 1 is it is characterised in that in step 202, based on conspicuousness, color contrast, rim detection
Information, calculates the type objects figure of pixel scale.
10. the method for claim 1 is it is characterised in that in step 203, construction energy equation further includes to be based on
The prior estimate construction continuation binary item of the former frame that step 202 is propagated, based on sound state Saliency maps constructed object outward appearance
Unitary item, based on appearance color construction with regard to color successional binary item, constructs the binary with regard to object based on type objects
?;Last still optimization using " figure cuts " of iteration makes this energy equation minimum, minimizes segmentation punishment, thus obtain binary dividing
Cut.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410366737.5A CN104134217B (en) | 2014-07-29 | 2014-07-29 | Video salient object segmentation method based on super voxel graph cut |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410366737.5A CN104134217B (en) | 2014-07-29 | 2014-07-29 | Video salient object segmentation method based on super voxel graph cut |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104134217A CN104134217A (en) | 2014-11-05 |
CN104134217B true CN104134217B (en) | 2017-02-15 |
Family
ID=51806886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410366737.5A Active CN104134217B (en) | 2014-07-29 | 2014-07-29 | Video salient object segmentation method based on super voxel graph cut |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104134217B (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809651B (en) * | 2014-12-16 | 2019-02-22 | 吉林大学 | Image significance detection method based on the comparison of edge non-similarity |
CN105069774B (en) * | 2015-06-30 | 2017-11-10 | 长安大学 | The Target Segmentation method of optimization is cut based on multi-instance learning and figure |
CN106611427B (en) * | 2015-10-21 | 2019-11-15 | 中国人民解放军理工大学 | Saliency detection method based on candidate region fusion |
CN105590100B (en) * | 2015-12-23 | 2018-11-13 | 北京工业大学 | Surpass the human motion recognition method of voxel based on identification |
CN107154052B (en) * | 2016-03-03 | 2020-08-04 | 株式会社理光 | Object state estimation method and device |
CN105913456B (en) * | 2016-04-12 | 2019-03-26 | 西安电子科技大学 | Saliency detection method based on region segmentation |
CN105931244B (en) * | 2016-04-29 | 2019-01-22 | 中科院成都信息技术股份有限公司 | The unsupervised stingy drawing method of one kind and device |
CN106372636A (en) * | 2016-08-25 | 2017-02-01 | 上海交通大学 | HOG-TOP-based video significance detection method |
CN106778634B (en) * | 2016-12-19 | 2020-07-14 | 江苏慧眼数据科技股份有限公司 | Salient human body region detection method based on region fusion |
CN107016675A (en) * | 2017-03-07 | 2017-08-04 | 南京信息工程大学 | A kind of unsupervised methods of video segmentation learnt based on non local space-time characteristic |
CN107133558B (en) * | 2017-03-13 | 2020-10-20 | 北京航空航天大学 | Infrared pedestrian significance detection method based on probability propagation |
CN107194948B (en) * | 2017-04-17 | 2021-08-10 | 上海大学 | Video significance detection method based on integrated prediction and time-space domain propagation |
CN107038704B (en) * | 2017-05-04 | 2020-11-06 | 季鑫 | Retina image exudation area segmentation method and device and computing equipment |
CN107564022B (en) * | 2017-07-13 | 2019-08-13 | 西安电子科技大学 | Saliency detection method based on Bayesian Fusion |
CN108229290B (en) | 2017-07-26 | 2021-03-02 | 北京市商汤科技开发有限公司 | Video object segmentation method and device, electronic equipment and storage medium |
CN109035293B (en) * | 2018-05-22 | 2022-07-15 | 安徽大学 | Method suitable for segmenting remarkable human body example in video image |
CN109191485B (en) * | 2018-08-29 | 2020-05-22 | 西安交通大学 | Multi-video target collaborative segmentation method based on multilayer hypergraph model |
CN109509194B (en) * | 2018-11-23 | 2023-04-28 | 上海师范大学 | Front human body image segmentation method and device under complex background |
CN109785327A (en) * | 2019-01-18 | 2019-05-21 | 中山大学 | The video moving object dividing method of the apparent information of fusion and motion information |
CN110347870A (en) * | 2019-06-19 | 2019-10-18 | 西安理工大学 | The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method |
CN110390293B (en) * | 2019-07-18 | 2023-04-25 | 南京信息工程大学 | Video object segmentation algorithm based on high-order energy constraint |
CN111182307A (en) * | 2019-12-27 | 2020-05-19 | 广东德融汇科技有限公司 | Ultralow code stream lossless compression method based on video images for K12 education stage |
CN112884302B (en) * | 2021-02-01 | 2024-01-30 | 杭州市电力设计院有限公司 | Electric power material management method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102637253A (en) * | 2011-12-30 | 2012-08-15 | 清华大学 | Video foreground object extracting method based on visual saliency and superpixel division |
CN103632153A (en) * | 2013-12-05 | 2014-03-12 | 宁波大学 | Region-based image saliency map extracting method |
CN103745468A (en) * | 2014-01-07 | 2014-04-23 | 上海交通大学 | Significant object detecting method based on graph structure and boundary apriority |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779338B (en) * | 2011-05-13 | 2017-05-17 | 欧姆龙株式会社 | Image processing method and image processing device |
US8989437B2 (en) * | 2011-05-16 | 2015-03-24 | Microsoft Corporation | Salient object detection by composition |
-
2014
- 2014-07-29 CN CN201410366737.5A patent/CN104134217B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102637253A (en) * | 2011-12-30 | 2012-08-15 | 清华大学 | Video foreground object extracting method based on visual saliency and superpixel division |
CN103632153A (en) * | 2013-12-05 | 2014-03-12 | 宁波大学 | Region-based image saliency map extracting method |
CN103745468A (en) * | 2014-01-07 | 2014-04-23 | 上海交通大学 | Significant object detecting method based on graph structure and boundary apriority |
Non-Patent Citations (1)
Title |
---|
融合颜色与运动信息的视频显著性滤波器;罗雷等;《华中科技大学学报(自然科学版)》;20140228;第42卷(第2期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104134217A (en) | 2014-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104134217B (en) | Video salient object segmentation method based on super voxel graph cut | |
Boulch et al. | SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks | |
Wang et al. | Semantic line framework-based indoor building modeling using backpacked laser scanning point cloud | |
Zhang et al. | Semantic segmentation of urban scenes using dense depth maps | |
CN108257139B (en) | RGB-D three-dimensional object detection method based on deep learning | |
Lei et al. | Region-tree based stereo using dynamic programming optimization | |
US8798965B2 (en) | Generating three-dimensional models from images | |
Garcia-Dorado et al. | Automatic urban modeling using volumetric reconstruction with surface graph cuts | |
AU2012244275A1 (en) | Method, apparatus and system for determining a boundary of an obstacle which occludes an object in an image | |
CN113378756B (en) | Three-dimensional human body semantic segmentation method, terminal device and storage medium | |
CN104715451A (en) | Seamless image fusion method based on consistent optimization of color and transparency | |
Tian et al. | Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection | |
Zhu et al. | Large scale urban scene modeling from MVS meshes | |
CN105931180A (en) | Salient information guided image irregular mosaic splicing method | |
Maltezos et al. | Automatic detection of building points from LiDAR and dense image matching point clouds | |
Du et al. | ResDLPS-Net: Joint residual-dense optimization for large-scale point cloud semantic segmentation | |
Li et al. | Seamline network generation based on foreground segmentation for orthoimage mosaicking | |
CN108388901A (en) | Collaboration well-marked target detection method based on space-semanteme channel | |
Li et al. | 3DCentripetalNet: Building height retrieval from monocular remote sensing imagery | |
Li et al. | Spatiotemporal road scene reconstruction using superpixel-based Markov random field | |
Tosteberg | Semantic segmentation of point clouds using deep learning | |
Bricola et al. | Morphological processing of stereoscopic image superimpositions for disparity map estimation | |
CN103733207A (en) | Method of image segmentation | |
Hoiem | Seeing the world behind the image | |
Laupheimer et al. | Juggling with representations: On the information transfer between imagery, point clouds, and meshes for multi-modal semantics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |