CN107644429B - Video segmentation method based on strong target constraint video saliency - Google Patents

Video segmentation method based on strong target constraint video saliency Download PDF

Info

Publication number
CN107644429B
CN107644429B CN201710946156.2A CN201710946156A CN107644429B CN 107644429 B CN107644429 B CN 107644429B CN 201710946156 A CN201710946156 A CN 201710946156A CN 107644429 B CN107644429 B CN 107644429B
Authority
CN
China
Prior art keywords
frame
target
pixel
video
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710946156.2A
Other languages
Chinese (zh)
Other versions
CN107644429A (en
Inventor
韩守东
张珑
刘昱均
陈阳
胡卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Shenzhen Huazhong University of Science and Technology Research Institute
Original Assignee
Huazhong University of Science and Technology
Shenzhen Huazhong University of Science and Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology, Shenzhen Huazhong University of Science and Technology Research Institute filed Critical Huazhong University of Science and Technology
Priority to CN201710946156.2A priority Critical patent/CN107644429B/en
Publication of CN107644429A publication Critical patent/CN107644429A/en
Application granted granted Critical
Publication of CN107644429B publication Critical patent/CN107644429B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video segmentation method based on strong target constraint video saliency, and belongs to the technical field of image processing. The method introduces strong target constraint on the basis of image significance, namely the position and scale constraint of a target are obtained through a multi-scale tracking algorithm and optical flow correction, the color constraint of the target is obtained through a historical frame segmentation result, and a video significance result is obtained through calculation; performing histogram classification operation on the video significance result to obtain a label mask image, and calculating a prior probability model of a front/background of a current frame; constructing a spatial-temporal continuous full-connection conditional random field model based on superpixels in a current frame, defining a data item by using the prior probability model, defining an intra-frame smooth item and an inter-frame smooth item by combining the color distance, the spatial distance and the edge relation among the superpixels, and performing optimization solution by using a fast high-dimensional Gaussian filtering algorithm to complete video target segmentation. The method effectively improves the accuracy and time efficiency of video segmentation.

Description

Video segmentation method based on strong target constraint video saliency
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a video segmentation method based on strong target constraint video saliency.
Background
Video segmentation is a technology for segmenting a foreground object in each frame of image in a video, namely completely determining a foreground contour. The video segmentation is usually in the bottommost link in the video processing algorithm, and the segmentation result is used for video application problems of upper layers, such as foreground feature extraction, foreground classification, foreground identification and the like of the video. Meanwhile, in industrial production, there is also a need for video segmentation, such as video splicing, three-dimensional video reconstruction, video semantic analysis, and the like.
In the video segmentation problem, the foreground region in each frame of image needs to be segmented. Generally, the video segmentation algorithm reduces the problem into two parts, namely, foreground possible region identification and image segmentation, namely, firstly, determining a possible foreground region in the current video frame image, then refining the region and segmenting the region by using the existing image segmentation method to obtain the foreground segmentation result of the current frame. Video segmentation can be classified into unsupervised video segmentation and supervised video segmentation according to the presence or absence of interaction volume. The unsupervised video segmentation method means that no manual interaction is needed, namely, after an input video is given, an algorithm directly calculates to obtain a processing result. The full-automatic video segmentation mode avoids manpower consumption, so the application cost is low. However, since the unsupervised video segmentation method always calculates all regions in the video frame, the processing efficiency is very low, and the accuracy of the segmentation result is general, which is difficult to be used in many practical situations. The supervised video segmentation method needs to add manual interaction, and generally selects to mark a foreground region to be segmented in a first frame of a video, or selects partial key frames in the video and marks a foreground, and also selects to pre-establish a target learning model aiming at a specific class of foreground. According to the obtained prior/background information, the target area of the video to be processed is reduced and the area is calculated in a refined mode, the time efficiency of the video segmentation algorithm is greatly improved at the cost of sacrificing a small amount of labor consumption, and meanwhile, a more accurate segmentation result is obtained, so that the method has higher practical value in industrial production.
However, even with manual interaction, only the foreground probable region of a very small portion of the keyframes can be determined. Therefore, how to obtain the foreground possible regions in all video frames according to the key frame prior information is a big problem in the video segmentation algorithm. For the problem, solutions such as directed acyclic graphs, point trajectory tracking, saliency and the like exist in the field of video segmentation, but many assumptions and harsh requirements exist on application scenes of the solutions. For the application of the image segmentation algorithm in video segmentation, the Graphcut model can obtain a more effective segmentation result, so that the Graphcut model is widely adopted in the field of image/video segmentation. However, the method has the disadvantages of low computational efficiency, and the problem is more obvious for video segmentation with large data volume. In order to adapt to the problem of huge data volume in video processing, a more concise and faster image segmentation algorithm needs to be introduced into video segmentation.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a video segmentation method based on strong target constraint video saliency, and aims to adopt strong target constraint video saliency extraction and a spatial-temporal continuous full-connection conditional random field video segmentation algorithm based on superpixels, so that the problems of inaccuracy and low efficiency of the existing video segmentation technology are solved.
In order to achieve the above object, the present invention provides a video segmentation method for constraining video saliency based on a strong target, the method comprising:
(1) calculating the optical flow motion information and the super-pixel segmentation result of the video frame;
(2) performing interactive segmentation on a first frame picture in a video frame sequence to obtain a target frame; calibrating a target foreground through a target frame and completing segmentation to obtain a target segmentation result; initializing a multi-scale tracking model by using the position and size information of the target frame on the basis of the target segmentation result;
(3) reading a next frame of image, obtaining a target frame of a current frame by using a multi-scale tracking model, correcting the target frame of the current frame by using the optical flow motion information of the current frame and a target segmentation result of a previous frame to obtain a corrected target frame, and updating the multi-scale tracking model by using the target position and scale information of the target frame;
(4) acquiring a target color model of the current frame according to the target segmentation results of the first frame and the previous frame; integrating the target color model, the target position and the scale information into image significance in a strong target constraint mode, and calculating to obtain strong target constraint video frame significance;
(5) threshold value limiting and histogram segmentation operations are carried out on the video frame significance result to obtain a rough label mask image of the current frame, and a prior probability model of the front/background of the current frame is calculated by combining the target segmentation result of the first frame;
(6) establishing a spatial-temporal continuous full-connection conditional random field model based on superpixels, defining data items by using a front/background prior probability model of a current frame, and defining an intra-frame smoothing item and an inter-frame smoothing item by combining color distances, spatial distances and edge relations among the superpixels; performing optimization solution by using a fast high-dimensional Gaussian filtering algorithm to obtain a target segmentation result of the current frame;
(7) and (5) repeating the steps (3) to (6) until the video segmentation is finished.
Further, the calculating process of the optical flow motion information in the step (1) includes:
(11) computing the optical flow gradient strength at pixel q in the video frame:
Figure BDA0001426911540000031
wherein λ ismRepresenting an intensity parameter;
Figure BDA0001426911540000032
represents the gradient value at pixel q;
(12) calculating the maximum value of the difference of the motion directions of each pixel and the adjacent pixels in the video frame:
Figure BDA0001426911540000033
wherein,
Figure BDA0001426911540000034
Representing the angular difference between pixel q and pixel q'; qqIs the neighborhood of pixel q; lambda [ alpha ]θRepresenting an angle difference parameter;
(13) the pixel velocity difference in video frames is:
Figure BDA0001426911540000035
wherein, TmRepresents a decision threshold; the invention uses the histogram iteration optimal threshold value method to self-adaptively calculate and obtain the decision threshold value Tm
(14) If b is judgedqIf the difference is larger than the difference threshold value, if so, the pixel q is a contour pixel of the motion area; the value range of the difference threshold is 0.4-0.6, preferably 0.5;
(15) starting from the direction of 12 points by taking any pixel in a video frame as a reference, emitting a ray clockwise every 45 degrees to obtain 8 rays, respectively counting the intersection times of the ray and a contour pixel, and if more than 4 odd rays intersected with the contour pixel exist, judging the pixel as an effective motion pixel; otherwise, the pixel is determined to be a noisy pixel.
Further, the calculating the super-pixel segmentation result in the step (1) specifically includes:
(16) the video frame is segmented by adopting an SLIC over-segmentation method to obtain M super-pixels riRepresenting a superpixel, i ∈ M, riThe number of pixels in the region is O, and the region riThe center of gravity of (a) represents a superpixel coordinate, it can be found that:
Figure BDA0001426911540000041
Figure BDA0001426911540000042
wherein the content of the first and second substances,
Figure BDA0001426911540000043
and
Figure BDA0001426911540000044
respectively representing a super pixel riThe abscissa and ordinate of the o-th pixel within the region,
Figure BDA0001426911540000045
and
Figure BDA0001426911540000046
then the horizontal and vertical coordinates of the super pixel are represented;
(17) super pixel r1And r2The physical distance between the two is characterized by:
Figure BDA0001426911540000047
wherein, thetaγRepresenting a physical distance characteristic parameter; thetaγThe value range is 2-4, preferably 3;
(18) super pixel r1And r2The color distance between the two is characterized by:
Figure BDA0001426911540000048
wherein n is1And n2Respectively representing the number of respective color types in the two super pixel areas; f (c)kI) represents the probability that the color of the ith pixel in the kth super-pixel region appears in the region; d (c)1,i,c2,j) Represents a pixel c1,iAnd c2,jThe Euclidean distance of the color;
(19) the edge characteristics under superpixel conditions are:
Figure BDA0001426911540000051
wherein, thetaαAnd thetaβIs an edge feature parameter. ThetaαThe value range of (a) is 18-22, thetaβHas a value range of 30~35,θαAnd thetaβPreferably 20 and 33, respectively.
Further, the step (3) of correcting the current frame target frame by using the optical flow motion information and the target segmentation result of the previous frame specifically includes:
(31) obtaining effective motion pixels in the current frame image;
(32) counting the proportion of the effective moving pixels in the current frame target frame, and if the proportion is greater than a proportion threshold value, entering a step (33); otherwise, entering a step (34); the value range of the proportional threshold is 0.7-0.9, preferably 0.8;
(33) respectively comparing the circumscribed rectangle of the effective motion pixels in the current frame target frame and the current frame target frame with the circumscribed rectangle of the previous frame target segmentation result; judging the smaller difference as the correction result; if the two and the circumscribed rectangle of the last frame of the segmentation result are both larger than the maximum difference threshold value, using the circumscribed rectangle of the last frame of the segmentation result as a correction result; the value range of the maximum difference threshold is 25-45%, preferably 35%; entering a step (35);
(34) judging a current frame target frame as a correction result;
(35) the length and the width of the correction result are respectively expanded by the multiple proportion, and if the expanded correction result is smaller than the original image, the correction result is returned as a current frame target frame; otherwise, expanding the correction result to the size of the original image and returning the result as the current frame target frame; the value range of the ratio multiple is 1.0-1.4, preferably 1.2.
Further, the step (4) of obtaining the target color model of the current frame according to the target segmentation result of the first frame and the previous frame specifically includes:
(41) establishing a Gaussian mixture model for the target foreground pixels of the first frame to obtain a color model Hfore-1(ii) a Respectively establishing a mixed Gaussian model for the foreground and the background of the target in the previous frame to obtain a foreground color model Hfore-2And a background color model Hback-2
(42) The target foreground color model of the current frame is:
Figure BDA0001426911540000066
the color model of the target background of the current frame is Hback=Hback-2(ii) a Wherein the content of the first and second substances,
Figure BDA0001426911540000061
weighting coefficients for the color model;
Figure BDA0001426911540000062
the value range of (a) is 0.2-0.4, preferably 0.3;
(43) super pixel qkForeground probability value Hfore-F(qk) Or probability value H of the backgroundback-F(qk) Comprises the following steps:
Figure BDA0001426911540000063
Hback-F(qk)=1-Pfore(qk);
wherein, the foreground color model HforeAnd a background color model HbackFurther obtaining a super pixel q in the current framekProbability H of belonging to the foreground respectivelyfore(qk) And probability of belonging to the background Hback(qk)。
Further, the calculating in the step (4) to obtain the saliency of the strong target constraint video frame specifically includes:
the closer the region is to the target color model, the stronger the saliency, and thus the video frame saliency:
Figure BDA0001426911540000064
wherein D iss(rk,ri) Representing a super pixel rkRegion and super pixel riThe center of gravity distance of the region; sigmasIs a distance coefficient; w (r)i) Is a super pixel riFor super pixel rkWeight of significance, riThe more the number of pixels in the region, the greater the influence; dr(rk,r2) Is rkAnd riThe color distance of (d);
for any super pixel rkRegion, target frame b, dis (r)kAnd b) obtaining the distance weight w of the superpixel from the center of the image as a function of the distance between the regionss(rk):
Figure BDA0001426911540000065
Wherein, tiI is 1,2,3 is the empirical weight, T1And T2Is an empirical threshold, tiI is 1,2,3 and T is between 0 and 11And T2The value of (a) is determined by the size of the image;
Sv(rk) The color model weights in (1) are:
Figure BDA0001426911540000071
wherein, the Hfore-F(rk) Representing a super pixel rkThe foreground probability value of (1).
Further, the step (5) specifically includes:
(51) the region saliency is classified into three classes of labels by a threshold definition and histogram segmentation algorithm:
Figure BDA0001426911540000072
for each super pixel region rkSetting label (r)k) Expressed as background, unknown and foreground regions using the values 0, 1 and 2, respectively; sv(rk) Representing a video frame saliency result; setting a defined threshold TbasicWhen the super pixel rkIs less than TbasicThen r iskThe area is a background area; t ishisIs a significant mean value, greater than ThisIs marked as a foreground region; the remaining part is marked as an unknown region; t isbasicThe value range of (A) is 0.4-0.5, preferably 0.45;
(52) calculating a prior probability model of the front/background of the current frame:
Θfore=Θfore-1+ρΘfore-S
Θback=Θback-S
wherein rho is a color model weighting coefficient; thetafore-SRepresenting a foreground mixed Gaussian model constructed by the foreground region pixels; thetaback-SRepresenting a background mixed Gaussian model constructed by the pixels in the background area; thetafore-1A Gaussian mixture model representing the foreground of the first frame of the target; the value range of rho is 0.3-0.4, preferably 0.35;
(53) normalizing the foreground probability value Θ of the current framefore-FComprises the following steps:
Figure BDA0001426911540000073
further, the spatial-temporal continuous full-connection conditional random field model based on the superpixel in the step (6) is specifically as follows:
defining random variables
Figure BDA0001426911540000081
Figure BDA0001426911540000082
Representing a super pixel region riThe segmentation tags of (a) are set,
Figure BDA0001426911540000083
0 is background and 1 is foreground; establishing a spatial-temporal continuous full-connection conditional random field model based on superpixels:
Figure BDA0001426911540000084
therein, ΨiIs a data item; ΨijIs an intra smoothing term; phiijIs an inter-frame smoothing term; m and N respectively represent the total number of superpixels in the current frame and the adjacent frame;
super pixel riForeground probability of
Figure BDA0001426911540000085
Comprises the following steps:
Figure BDA0001426911540000086
wherein, thetafore-F(qo) Representing the current frame pixel qoA foreground probability value of (d); super pixel riThe number of pixels in the region is O;
data item ΨiComprises the following steps:
Figure BDA0001426911540000087
intra smoothing term definition ΨijComprises the following steps:
Ψij(ri,rj)=w1fdis(ri,rj)+w2Dr(ri,rj)+w3fedge(ri,rj) i,j∈M,
wherein f isdis(ri,rj) Representing a super pixel ri,rjA physical distance feature therebetween; dr(ri,rj) Representing a super pixel ri,rjA color distance feature therebetween; f. ofedge(ri,rj) Representing a super pixel ri,rjEdge features in between; w is a1、w2And w3Respectively represents fdis(ri,rj)、Dr(ri,rj) And fedge(ri,rj) The proportion occupied in the intraframe smoothing term; w is a1、w2And w3The value ranges of (a) are respectively 5-6, 9-10 and 1-2, w1、w2And w3Preferably 6, 10 and 2, respectively;
inter-frame smoothing term ΦijComprises the following steps:
Figure BDA0001426911540000088
Figure BDA0001426911540000089
wherein the content of the first and second substances,
Figure BDA0001426911540000091
is a neighboring frame region rjα is a time domain connection distance characteristic parameter, and the value range of α is 3-5, preferably 4.0.
Generally, compared with the prior art, the technical scheme of the invention has the following technical characteristics and beneficial effects:
(1) the method introduces strong target constraint on the basis of the image significance, and obtains the significance result of the video level, namely the strong target constraint video significance provided by the invention is used for enabling the target segmentation process to be efficient and accurate in video segmentation;
(2) in the single-frame image segmentation link, the method is different from the traditional video segmentation method which uses a graph cut segmentation model to perform single-frame image segmentation processing, the method uses and improves the image segmentation algorithm of a full-connection condition random field, introduces superpixels to replace pixels as a basic modeling unit, increases video inter-frame connection, constructs a space-time continuous full-connection condition random field video segmentation model based on the superpixels, and can effectively improve the accuracy and the time efficiency of video segmentation.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a first frame of video with an initial tracking frame of a target marked with artificial markers according to an embodiment of the present invention;
FIG. 3 is an intermediate result of the target scale and location information correction process in an embodiment of the present invention;
FIG. 4 is a saliency map computed using a video saliency method in an embodiment of the present invention;
fig. 5 shows a segmentation result calculated by using a video segmentation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention mainly comprises two aspects, which are respectively as follows:
firstly, extracting the saliency of a strong target constraint video;
video saliency evolved from the concept of image saliency, and it is generally believed that image saliency characterizes the most salient regions in the current image, i.e., the regions most likely to be labeled as foreground, and that effective saliency results will greatly reduce human resource consumption in computer vision problems. Meanwhile, in an image, a salient region generally satisfies two assumptions: the color difference between the salient region and any other region in the image is large; the salient region is closer to the center of the image than other regions. However, the features present in video are richer than images, and the present invention extracts saliency results in the video context by adding strong target constraints to the assumption of image saliency. First, the present invention proposes:
firstly, obtaining the position and scale information of a target by a multi-scale tracking algorithm and optical flow correction;
then, estimating a color model of the target through a front/background Gaussian mixture model of the historical frame segmentation result;
and finally, blending the color model and the corrected target position and scale information into the image significance in a strong target constraint mode, and calculating to obtain a strong target constraint video significance result.
1. Optical flow motion information calculation:
firstly, an optical flow algorithm is used among video frame sequences to obtain an optical flow field in each frame of image. While the motion region in the optical flow field results has two distinct features: on one hand, the motion states of all pixels in the motion area are consistent and have obvious outlines, and the motion states of the pixels in the non-motion area are disordered and have no obvious object area outlines; on the other hand, the motion direction of the edge pixel of the motion area is greatly different from the motion direction of the pixel of the non-motion area of the neighborhood. Therefore, the present invention requires preprocessing of the optical flow field according to the above features to obtain a more efficient optical flow area.
First, the gradient of the optical flow field of the video frame is calculated. In the motion area, the gradient values are all close to 0 because the motion states of the pixels are consistent; outside the motion region, the pixel motion has no significant features, and the gradient value is chaotic and small; at the edge of the motion area, the gradient value of the edge of the motion area is larger due to the great motion difference of the inner pixel and the outer pixel, so that the motion edge image of the video frame can be obtained after the result is normalized. In the present invention, the gradient value at pixel q is defined as
Figure BDA0001426911540000101
It represents the velocity of the pixel, and the optical flow gradient strength at pixel q can then be expressed as:
Figure BDA0001426911540000111
wherein λ ismRepresenting an intensity parameter.
According to
Figure BDA0001426911540000112
The gradient strength result of the represented video frame can distinguish most of the contour pixels of the motion area from the global non-contour pixels. In particular, the gradient strength value is greater than some decision threshold TmThe motion of the pixels is significant enough, so that the pixels can be immediately judged as contour pixels of a motion area; less than T for gradient intensity valuesmThe pixels in (2) need to further determine their attribution by using the characteristic that the motion direction of the contour pixels in the motion area is different from that of the surrounding pixels. Therefore, the present invention also requires the computation of each imageMaximum of motion direction (angle) difference of pixel and neighborhood pixel:
Figure BDA0001426911540000113
wherein the content of the first and second substances,
Figure BDA0001426911540000114
representing the angular difference between pixel Q and pixel Q', QqIn the neighborhood of pixel q, λθRepresenting an angular difference parameter.
By combining the gradient intensity values of the optical flow field and the difference of the motion directions between pixels, we can obtain the following difference result of pixel speeds in the video frame:
Figure BDA0001426911540000115
since the threshold value T is determined under different scenesmThe difference exists, therefore, the invention uses the histogram iteration optimal threshold method to self-adaptively calculate and obtain the decision threshold value Tm. For bqPixels greater than 0.5, we consider them to be motion region contour pixels.
From the above operation, a rough contour map of a motion region in a video frame can be obtained. However, in video scenes where there is a junction of distant and close views, the noise interference in the coarse silhouette cannot be eliminated using morphological processing. Therefore, it is necessary to further extract the motion information of the video frame on the basis of the rough contour map. It is observed that noise generally does not have a distinct contour, and thus the noise effect can be removed according to the contour characteristics. Specifically, the method takes any pixel in a video frame as a reference, and emits a ray every 45 degrees clockwise from the direction of 12 points to obtain 8 rays, and counts the intersection times of each ray and the contour pixel respectively. Obviously, when the pixel is inside the contour of the closed region, the number of times each ray intersects the contour should be odd; when the pixel is outside the contour of the occlusion region, the number of times each ray intersects the contour should be an even number of times. If more than 4 rays which intersect the contour for odd times exist in the pixel, the pixel is judged as a pixel in the contour, namely an effective motion pixel; otherwise, the pixel is determined to be an out-of-contour pixel, i.e., a noise pixel. Particularly, an integral graph algorithm can be used for quickly obtaining a judgment result. Through this process, we can obtain the optical flow motion information in the video frame.
2. Calculating the target position and scale information:
the target position and scale information cannot be accurately obtained by simply using a target tracking algorithm or an optical flow algorithm. Such as: when the target area moves slowly, the optical flow field is in a disordered state, and the position and the scale of the foreground target can be effectively positioned by a tracking algorithm; when the target area moves violently, the optical flow field presents a state of motion consistency in the moving area, and the tracking algorithm is easy to lose or bias the foreground target. Therefore, in order to obtain Accurate target position and scale information, the invention firstly uses a multi-scale target Tracking algorithm DSST (Accurate scale estimation for Robust Visual Tracking) based on kernel-correlation filtering to obtain preliminary foreground target position and scale information, and then verifies and necessarily corrects the Tracking result by using the optical flow motion information and the target segmentation result of the previous frame. The specific calculation process is as follows:
(1) and verifying whether the optical flow field is effective, wherein the optical flow field effectiveness judgment criterion is whether the proportion of effective motion pixels positioned in the video tracking target frame is greater than a given threshold value, and the threshold value is usually set to be 0.8. If the current frame optical flow result is invalid, abandoning the current frame optical flow result and entering into the step (4), and if the current frame optical flow result is valid, entering into the step (2);
(2) calculating optical flow motion information and a contour thereof;
(3) comparing the tracking result with the optical flow motion contour, and if the difference is small, entering correction Step 4; otherwise, taking the video tracking result as a correction result;
(4) respectively comparing the tracking result and the optical flow motion contour with the previous frame segmentation result, judging the one with small difference as a correction result, and if the two are greatly different from the previous frame segmentation result, using the previous frame segmentation result as the correction result;
(5) and enlarging the correction result obtained by the operation by a certain proportion and returning the enlarged correction result as an effective correction result. Through the operation, accurate target position and scale information can be obtained through calculation;
(6) and updating tracking model parameters.
The accurate foreground target position and scale information not only provides accurate foreground target characteristic information for video frame segmentation, but also greatly compresses a region to be segmented, most background regions are removed after correction operation, and the accuracy of the segmentation algorithm is further improved due to the reduction of redundant information. In the stage of executing the video frame target segmentation, the algorithm operation efficiency is obviously improved only by establishing a segmentation graph model for the region to be segmented.
3. Target color model estimation:
besides the position and scale information of the target, the invention also estimates the color model of the target according to the existing target segmentation result. In a video scene, the color model of a foreground object to be segmented usually has only slight change in a shot, and the color model is basically consistent as a whole. In the video input stage, the video segmentation algorithm provided by the invention performs manual interactive segmentation on the first frame of video to obtain an accurate target segmentation result, and the color model of the foreground target of the first frame represents the color model of the target in a complete shot to a great extent. At the same time, we note that the color model of the current frame object is closest to the color model of the object in the previous frame video. Therefore, the color model of the foreground object in the first frame is used as the basic color model, and is fused into the color model in the previous frame in a weighting mode, so that the color model of the current frame object can be estimated.
The color model H can be obtained by establishing a mixed Gaussian model for the foreground target of the first framefore-1Respectively establishing a mixed Gaussian model for the foreground target and the background of the previous frame to obtain a color model Hfore-2,Hback-2. From Hfore-1、Hfore-2Weighting can be estimated to obtain a foreground target color model H of the current framefore
Figure BDA0001426911540000131
Hback=Hback-2
In the above formula, the first and second carbon atoms are,
Figure BDA0001426911540000132
weighting coefficients for color models, in general
Figure BDA0001426911540000133
The value was set to 0.3. Meanwhile, we find that the scenes of two frames of videos in the same shot are similar, so the invention uses the background color model H in the last frameback-2To estimate the background color model H of the current frameback
Calculating a foreground color model H according to the estimationforeAnd a background color model HbackEach pixel q in the video frame may be further obtainedkProbability H of belonging to the foreground respectivelyfore(qk) And probability of belonging to the background Hback(qk). Since the probability of a pixel belonging to the foreground and the background may be quite close, in order to better determine the probability to which the pixel label belongs, the probability value of the pixel belonging to the foreground or the background is normalized by combining the foreground and background color models.
Figure BDA0001426911540000141
Hback-F(qk)=1-Pfore(qk)
Pixel foreground probability value obtained by the above formula
Figure BDA0001426911540000142
And background probability value
Figure BDA0001426911540000143
Effectively representing the target color model constraints of the video frame.
4. According to two assumptions of the saliency of an image,in the image, the greater the color difference value between any region and other regions is, the stronger the significance of the region is; the closer any region is to the center of the image, the more salient this region is. From this assumption, conventional image saliency can be obtained, with either region rkHas the significance of SI(rk):
Figure BDA0001426911540000144
In the formula, ws(rk) Indicating the region rkDistance from the center of the image, Ds(rk,ri) Indicating the region rkWith other regions r in the imageiDistance of center of gravity, σsIs a distance coefficient, w (r)i) Is a region riFor the region rkWeight of significance, riThe larger the number of pixels in the area, the larger the influence, and finally, the calculation area rkAnd region riColor distance D ofr(rk,ri):
Figure BDA0001426911540000145
The above formula defines the inter-region color distance D under the saliency conceptr(r1,r2) Wherein r is1,r2Representing two super-pixel regions, n1,n2Number of color classes representing two super-pixel regions, f (c)kI) denotes the probability that the color of the ith pixel in the kth region appears in the region, D (c)1,i,c2,j) Representing the euclidean distance of the pixel color.
In fact, in video, the extractable feature information is richer than in images. If the characteristics related to the upper frame and the lower frame exist in the video, the segmentation result of the target is extracted frame by using a video segmentation algorithm, so that more help can be provided for calculating the image/video saliency by using the segmented target information. As described above, by correcting the target tracking result by the optical flow motion information and the last frame target segmentation result, accurate position and scale information of the target can be acquired; meanwhile, the color model constraint of the current frame target can be calculated according to the existing segmentation result. These information all characterize the target characteristics to some extent.
Therefore, on the basis of the traditional image saliency, the hypothesis that the closer the image saliency is to the center of the image, the stronger the saliency is expanded, namely the region in the video which is considered to be closer to the center of the target is stronger in the saliency; meanwhile, the target color model is further introduced in the invention, namely the more the assumed region is close to the target color model, the stronger the significance is, and the video significance S can be obtainedv(rk):
Figure BDA0001426911540000151
In the formula, r is expressed for an arbitrary regionkB is the target frame, tiI is 1,2,3 is the empirical weight, T1And T2Is an empirical threshold, dis (r)kAnd b) as a function of the distance between the regions, the distance weight w of the region from the center of the image can be obtaineds(rk):
Figure BDA0001426911540000152
The piecewise function is used here to significantly increase the significance value in the scale and effectively decrease the significance value out of the scale, and other reasonable nonlinear functions can be used instead.
Further, S can be obtainedv(rk) Color model weight w ino(rk),
Figure BDA0001426911540000161
Wherein Hfore-F(rk) For estimating the calculated region r according to the target color modelkThe foreground probability value of (1).
And secondly, performing space-time continuous full-connection conditional random field video segmentation algorithm based on the superpixels.
1. Constructing a target color model based on the significance of the strong target constraint video:
the traditional video segmentation method based on object proposal segments a video frame into a plurality of object candidate regions through over-segmentation operation, finds out the region with the maximum object probability in the video frame from the candidate regions according to the corresponding object probability calculation criteria, and then constructs a color model of a foreground object by using the region. In general, the candidate region with the highest probability of an object in the algorithm can cover most of the foreground object. However, the super-pixel region obtained by using the over-segmentation method is difficult to completely cover the foreground object and even the segmentation error may occur, so that the video segmentation method based on the object proposal is difficult to construct an accurate object color model even if the most suitable object candidate region can be always selected.
From the experimental results of the strong target constraint video saliency algorithm presented above, it can be found that pixels within the foreground target region consistently exhibit high saliency, while pixels outside the target region have very low saliency or even almost 0. Meanwhile, by observing various video scenes, the appearance of the foreground object to be segmented in the video is integrally consistent under the whole lens, namely the foreground color models between any two frames are similar.
Under the initiation of the above discovery, the present invention proposes a method for constructing a target color model based on strong target constraint video saliency. The method firstly divides the significance result into three types of labels through a threshold value limiting and histogram segmentation algorithm:
Figure BDA0001426911540000162
for each super pixel region rkAll can set its label (r)k) The background, unknown and foreground regions are represented by the values 0, 1 and 2, respectively. At the same time, the algorithm sets a defined threshold TbasicWhen region rkIs less than TbasicThen, the region is considered as the background region, T in the present inventionbasicTake 0.45. Further, the method obtains a corresponding histogram by counting the significance values of the non-background areas, and takes the significance mean value in the histogram as a segmentation threshold value ThisThe non-background regions are divided into two categories, i.e. greater than ThisIs marked as foreground region and the rest is marked as unknown region.
According to the region label of the video frame, respectively using the pixels of the foreground label region and the pixels of the background label region to construct a foreground Gaussian mixture model thetafore-SAnd the background Gaussian mixture model thetaback-S. In the above, the present invention proposes that the color model of the target in the first frame is used as the basic model, and the target color model of the current frame is finally estimated by blending the color model of the target in the previous frame in a weighting manner. However, introducing the color model of the last video frame segmentation object may generate accumulated errors. Therefore, in the current link, the method also combines the strong target constraint video saliency result and the first frame target color model in a weighting mode to construct a more accurate current frame target color model:
Θfore=Θfore-1+ρΘfore-S
Θback=Θback-S
where ρ is a color model weighting coefficient, and the value of ρ is usually set to 0.35. Finally, the target foreground probability value theta is normalizedfore-FComprises the following steps:
Figure BDA0001426911540000171
2. a space-time continuous super-pixel video segmentation method based on strong target constraint video saliency comprises the following steps:
the full-connection conditional random field segmentation method is an image segmentation method based on the full-connection conditional random field. Because the method is fast in solving and excellent in segmentation effect, the space-time continuous full-connection conditional random field video segmentation method is designed by increasing inter-frame connection on the basis of the method. Meanwhile, in the image segmentation problem, pixels are generally used as basic operation units, and the pixels can sufficiently express information in image data, thereby ensuring high accuracy of segmentation results. However, in a video scene with tens of frames of images per second, still using pixels as the basic segmentation unit to calculate the video segmentation result consumes a lot of system resources and may result in poor video segmentation efficiency. In fact, in most cases, some similar and similar pixels express the same information, and if these pixels can be collectively treated as the same pixel in the preprocessing stage, the execution efficiency of the video segmentation algorithm can be certainly greatly improved. Therefore, the present invention proposes to use superpixels instead of pixels as the basic unit of operation for the segmentation algorithm. Under the condition of super-pixel, each basic operation unit is not simply represented by pixel information any more, and the mutual relation among the basic units is changed correspondingly, so that the segmentation model represented by the super-pixel basic unit needs to be redefined.
Obtaining M superpixels by SLIC over-segmentation method, M is usually set to 250, and each superpixel riRepresents a region and has i ∈ M, riThe number of pixels in the region is O, and the region r is definediThe center of gravity represents the superpixel coordinates, and it can be:
Figure BDA0001426911540000181
Figure BDA0001426911540000182
in the above formula, the first and second carbon atoms are,
Figure BDA0001426911540000183
respectively representing a super pixel riThe abscissa and ordinate of the o-th pixel,
Figure BDA0001426911540000184
Figure BDA0001426911540000185
then it represents a super pixelThe coordinates of (a).
Definition of superpixel coordinates we can derive the physical distance feature f between two superpixelsdis(r1,r2):
Figure BDA0001426911540000186
Wherein the physical distance characteristic parameter thetaγTypically set to 3.
As defined above with respect to the color distance between superpixel regions, the color distance feature D between two superpixels can be obtainedr(r1,r2) Finally, we define the edge feature f under superpixel conditionsedge(r1,r2):
Figure BDA0001426911540000187
In the present invention, the edge feature parameter θα,θβSet to 20 and 33 respectively. From the above definitions, we determine the superpixels and their feature representations in the segmentation model.
First, we define a random variable
Figure BDA0001426911540000191
Figure BDA0001426911540000192
Each indicates a segmentation label, 0 is the background, and 1 is the foreground. Based on an image segmentation energy function of a full-connection conditional random field, an energy function of the full-connection conditional random field under video segmentation is defined as follows:
Figure BDA0001426911540000193
therein, ΨiFor data items, ΨijFor intra-frame smoothing terms, [ phi ]ijFor the inter-frame smoothing term, M and N represent the total number of superpixels in the current frame and the neighboring frame, respectively. Compared with the conventional image segmentationThe energy function in the random field differs in the method with respect to the full-link condition, energy function Esuper-pixel(y) adding an inter-frame smoothing term phiij(ii) a In fact, only using the data item and the intra-frame smoothing item to solve the energy function optimization can obtain a relatively accurate single-frame video segmentation result. However, in a video scene, the foreground objects between adjacent frames do not have large deformation generally, so that the influence of jitter, noise interference and the like generated in the video segmentation process can be prevented by increasing the connection relation in the time dimension, and a space-time continuous video segmentation result is obtained.
According to the super pixel region riWithin each pixel qoNormalized target foreground probability value Θfore-F(qo) The super pixel region r can be calculatediForeground probability of
Figure BDA0001426911540000194
Comprises the following steps:
Figure BDA0001426911540000195
further, a data item definition Ψ may be derivediComprises the following steps:
Figure BDA0001426911540000196
in a segmented scene, the probability that two areas with the closer relative positions belong to the same label is higher, and the probability that the labels of the areas with the more similar colors are the same is also higher; at the same time, at the same edge of the region, the classification tags also present a consistent state with a greater probability. Therefore, the physical distance characteristic f is considereddis(ri,rj) Color distance feature Dr(ri,rj) And edge feature fedge(ri,rj) The intra smoothing term definition Ψ can be derivedij
Ψij(ri,rj)=w1fdis(ri,rj)+w2Dr(ri,rj)+w3fedge(ri,rj) i,j∈M
Wherein, w1、w2、w3Each represents the proportion of different features in the intra smoothing term, and is typically set to 6, 10, and 2, respectively.
In addition, the interframe smoothing term is used for improving the smoothness of the segmentation result and ensuring that the output result has no jitter, so the interframe smoothing term defines phiij
Figure BDA0001426911540000205
Figure BDA0001426911540000201
alpha is a time domain connection distance characteristic parameter and is set to be 4.0 in the invention.
Figure BDA0001426911540000202
Is a neighboring frame region rjThe label of (1).
In the video segmentation process, the segmentation algorithm may cause a large deviation of the segmentation result due to the particularity of the video scene and the contingency of the actual operation of the program. At this time, if a video segmentation algorithm based on full video frames is used, not only a low-quality video segmentation result is generated, but also the algorithm cannot reach a stable state again through external interference operation. The super-pixel-based space-time continuous random field video segmentation method provided by the invention uses an algorithm framework for carrying out target segmentation frame by frame, and the strategy effectively avoids the problems. Therefore, when each frame of target is divided, only an inter-frame smoothing term between the current frame and the previous frame of video is considered in the energy function. Meanwhile, as the target segmentation result in the last frame is known, when the energy function is solved, the energy function is used
Figure BDA0001426911540000203
To calculate an inter-frame smoothing term
Figure BDA0001426911540000204
And finally, the super-pixel-based space-time continuous full-connection conditional random field video frame segmentation result can be deduced by using rapid high-dimensional Gaussian filtering.
As shown in fig. 1, a video segmentation process of a video bear according to an embodiment of the present invention includes the following steps:
(1) the optical flow motion information and the super-pixel pre-segmentation result of the video frame are calculated in a pre-processing mode, because the optical flow motion information and the super-pixel result in the video frame need to be obtained in the implementation process of the method and the calculation of the optical flow motion information and the super-pixel result can be completed before all processing, the optical flow motion information and the super-pixel result of the video bear are calculated in the pre-processing stage;
(2) when the first frame of the bear video is read, because the invention does not know what the target to be segmented is, an interactive interface needs to be provided, and the target to be segmented is given artificially, as shown in fig. 2, the target is the first frame of the bear video, and the rectangular frame in the picture is marked by the person. After the position and size information of the rectangular frame is obtained through man-machine interaction, the information is displayed in the image. After an initial target frame is obtained, a target foreground is calibrated through the target frame and segmentation is completed to obtain a target segmentation result; initializing a multi-scale tracking model by using the position and size information of the target frame on the basis of the target segmentation result;
(3) tracking and correcting and updating tracking model parameters of a current frame target, from a target frame in a previous frame, a rough target frame in the current frame can be obtained by using a multi-scale tracking model, however, even the most elegant video tracking algorithm cannot completely capture the target under the condition that the target is changed drastically, so here, the invention proposes to calculate video motion information by using an optical flow algorithm to assist in correcting the tracking result:
first, pixel velocity intensity, velocity gradient, and velocity intensity gradient in the light flow map are calculated. Determining stable motion pixels in the image according to the calculation result; counting the proportion of the stable motion pixels in the approximate target frame, if the proportion is smaller than an empirical threshold, judging that no stable motion area exists in the approximate target frame, and at the moment, the target tends to have no change, namely the video tracking operation obtains quite accurate target scale and position information; if the ratio is greater than the empirical threshold, it is determined that a stable motion area exists in the approximate target frame, and the optical flow may be used to correct the target frame to find a bounding rectangle of the stable motion pixels in the current target frame. Respectively comparing the tracking target frame and the optical flow motion frame with a circumscribed rectangle frame of a previous frame segmentation result, and selecting a proper window as a target frame according to the given criteria;
finally, amplifying the target frame in a certain proportion, and identifying the result as target scale and position information; as shown in fig. 3, the results of each stage in the foreground object dimension and position calculation step in the bear video are shown. In the figure, a black window is a video tracking result, a red window is an effective optical flow area, and a blue window is a target frame for segmentation after correction;
(4) extracting the significance of a strong target constraint video, and establishing a new graph structure of a current frame image by taking a super pixel as a basic unit according to a super pixel result; in the new graph structure, each node is represented by a super pixel, the position of the node is the geometric center of the super pixel, and the color distance between the node and other nodes can be obtained by accumulating the difference value of each pixel in two super pixel areas; meanwhile, according to the target scale, the current frame is divided into 3 saliency areas with different scales, and the closer the super-pixel distance to the center of the target, the larger the acquired saliency weight is; finally, adding a target Gaussian mixture model estimated from the segmentation result of the first frame and the last frame, and solving a final significance result; FIG. 4 shows the result of the saliency calculation for a video bear in a second frame;
(5) estimating a rough label mask image and a front/background prior probability model, dividing a video significance result into three types of labels (a background area, an unknown area and a foreground area) by using a threshold value limiting and histogram segmentation algorithm, estimating the rough label mask image of a current frame, and further respectively constructing a foreground Gaussian mixture model and a background Gaussian mixture model by using pixels of the foreground label area and pixels of the background label area; in the video segmentation problem, the color change part is usually the background, and the foreground color change amplitude is small, so the method can more accurately calculate the prior probability model of the front/background of the current frame by adopting a weighting mode of a front/background Gaussian mixture model in combination with the target segmentation result of the first frame;
(6) and constructing a spatial-temporal continuous full-connection condition random field model based on the superpixel, rapidly optimizing and solving, and establishing a spatial-temporal continuous full-connection condition random field graph structure based on the superpixel according to the video segmentation model. Then, the graph structure is solved by using a fast high-dimensional Gaussian filter algorithm, and the segmentation result of the current frame is obtained. The segmentation algorithm is used in the sample video bear, and the obtained result is shown in fig. 5;
(7) and (5) keeping the segmentation result and the target frame of the current frame, reading the video image of the next frame and the optical flow motion information thereof, and repeating the steps (3) to (7) until the video segmentation is finished.
It will be appreciated by those skilled in the art that the foregoing is only a preferred embodiment of the invention, and is not intended to limit the invention, such that various modifications, equivalents and improvements may be made without departing from the spirit and scope of the invention.

Claims (7)

1. A video segmentation method based on strong target constraint video saliency is characterized by comprising the following steps:
(1) calculating the optical flow motion information and the super-pixel segmentation result of the video frame; the calculating the super-pixel segmentation result specifically comprises:
(16) the video frame is segmented by adopting an SLIC over-segmentation method to obtain M super-pixels riRepresenting a superpixel, i ∈ M, riThe number of pixels in the region is O, and the region riThe center of gravity of (a) represents a superpixel coordinate, it can be found that:
Figure FDA0002305642490000011
Figure FDA0002305642490000012
wherein the content of the first and second substances,
Figure FDA0002305642490000013
and
Figure FDA0002305642490000014
respectively representing a super pixel riThe abscissa and ordinate of the o-th pixel within the region,
Figure FDA0002305642490000015
and
Figure FDA0002305642490000016
then the horizontal and vertical coordinates of the super pixel are represented;
(17) super pixel r1And r2The physical distance between the two is characterized by:
Figure FDA0002305642490000017
wherein, thetaγRepresenting a physical distance characteristic parameter;
(18) super pixel r1And r2The color distance between the two is characterized by:
Figure FDA0002305642490000018
wherein n is1And n2Respectively representing the number of respective color types in the two super pixel areas; f (c)kI) represents the probability that the color of the ith pixel in the kth super-pixel region appears in the region; d (c)1,i,c2,j) Represents a pixel c1,iAnd c2,jThe Euclidean distance of the color;
(19) the edge characteristics under superpixel conditions are:
Figure FDA0002305642490000021
wherein, thetaαAnd thetaβIs an edge feature parameter;
(2) performing interactive segmentation on a first frame picture in a video frame sequence to obtain a target frame; calibrating a target foreground through a target frame and completing segmentation to obtain a target segmentation result; initializing a multi-scale tracking model by using the position and size information of the target frame on the basis of the target segmentation result;
(3) reading a next frame of image, obtaining a target frame of a current frame by using a multi-scale tracking model, correcting the target frame of the current frame by using current frame optical flow motion information and a target segmentation result of a previous frame, and updating the multi-scale tracking model by using the target position and scale information of the corrected target frame;
(4) acquiring a target color model of the current frame according to the target segmentation results of the first frame and the previous frame; integrating the target color model, the target position and the scale information into image significance in a strong target constraint mode, and calculating to obtain strong target constraint video frame significance;
(5) threshold value limiting and histogram segmentation operations are carried out on the video frame significance result to obtain a rough label mask image of the current frame, and a prior probability model of the front/background of the current frame is calculated by combining the target segmentation result of the first frame;
(6) establishing a spatial-temporal continuous full-connection conditional random field model based on superpixels, defining data items by using a front/background prior probability model of a current frame, and defining an intra-frame smoothing item and an inter-frame smoothing item by combining color distances, spatial distances and edge relations among the superpixels; performing optimization solution by using a fast high-dimensional Gaussian filtering algorithm to obtain a target segmentation result of the current frame;
(7) and (5) repeating the steps (3) to (6) until the video segmentation is finished.
2. The method for video segmentation based on strong target constraint video saliency as claimed in claim 1, wherein the calculation process of the optical flow motion information in step (1) includes:
(11) computing the optical flow gradient strength at pixel q in the video frame:
Figure FDA0002305642490000022
wherein λ ismRepresenting an intensity parameter;
Figure FDA0002305642490000023
represents the gradient value at pixel q;
(12) calculating the maximum value of the difference of the motion directions of each pixel and the adjacent pixels in the video frame:
Figure FDA0002305642490000031
wherein the content of the first and second substances,
Figure FDA0002305642490000032
representing the angular difference between pixel q and pixel q'; qqIs the neighborhood of pixel q; lambda [ alpha ]θRepresenting an angle difference parameter;
(13) the pixel velocity difference in video frames is:
Figure FDA0002305642490000033
wherein, TmIndicating a decision threshold
(14) If b is judgedqIf the difference is larger than the difference threshold value, if so, the pixel q is a contour pixel of the motion area;
(15) starting from the direction of 12 points by taking any pixel in a video frame as a reference, emitting a ray clockwise every 45 degrees to obtain 8 rays, respectively counting the intersection times of the ray and a contour pixel, and if more than 4 odd rays intersected with the contour pixel exist, judging the pixel as an effective motion pixel; otherwise, the pixel is determined to be a noisy pixel.
3. The method according to claim 1, wherein the correcting the current frame target frame in step (3) using the optical flow motion information and the target segmentation result of the previous frame specifically comprises:
(31) obtaining effective motion pixels in the current frame image;
(32) counting the proportion of the effective moving pixels in the current frame target frame, and if the proportion is greater than a proportion threshold value, entering a step (33); otherwise, entering a step (34);
(33) respectively comparing the circumscribed rectangle of the effective motion pixels in the current frame target frame and the current frame target frame with the circumscribed rectangle of the previous frame target segmentation result; judging the smaller difference as the correction result; if the two and the circumscribed rectangle of the last frame of the segmentation result are both larger than the maximum difference threshold value, using the circumscribed rectangle of the last frame of the segmentation result as a correction result; entering a step (35);
(34) judging a current frame target frame as a correction result;
(35) the length and the width of the correction result are respectively expanded by the multiple proportion, and if the expanded correction result is smaller than the original image, the correction result is returned as a current frame target frame; otherwise, the correction result is expanded to the size of the original image and then returned as the current frame target frame.
4. The method according to claim 1, wherein the obtaining the target color model of the current frame according to the target segmentation result of the first frame and the previous frame in the step (4) specifically comprises:
(41) establishing a Gaussian mixture model for the target foreground pixels of the first frame to obtain a color model Hfore-1(ii) a Respectively establishing a mixed Gaussian model for the foreground and the background of the target in the previous frame to obtain a foreground color model Hfore-2And a background color model Hback-2
(42) The target foreground color model of the current frame is:
Figure FDA0002305642490000041
of the current frameThe color model of the target background is Hback=Hback-2(ii) a Wherein the content of the first and second substances,
Figure FDA0002305642490000042
weighting coefficients for the color model;
(43) super pixel qkForeground probability value Hfore-F(qk) Or probability value H of the backgroundback-F(qk) Comprises the following steps:
Figure FDA0002305642490000043
Hback-F(qk)=1-Pfore(qk);
wherein, the foreground color model HforeAnd a background color model HbackFurther obtaining a super pixel q in the current framekProbability H of belonging to the foreground respectivelyfore(qk) And probability of belonging to the background Hback(qk)。
5. The video segmentation method based on the saliency of the strong target constraint video according to claim 1, wherein the saliency of the strong target constraint video frame calculated in the step (4) is specifically:
the closer the region is to the target color model, the stronger the saliency, and thus the video frame saliency:
Figure FDA0002305642490000044
wherein D iss(rk,ri) Representing a super pixel rkRegion and super pixel riThe center of gravity distance of the region; sigmasIs a distance coefficient; w (r)i) Is a super pixel riFor super pixel rkWeight of significance, riThe more the number of pixels in the region, the greater the influence; dr(rk,r2) Is rkAnd riThe color distance of (d);
for renSuper pixel rkRegion, target frame b, tiI is 1,2,3 is the empirical weight, T1And T2Is an empirical threshold, dis (r)kAnd b) obtaining the distance weight w of the superpixel from the center of the image as a function of the distance between the regionss(rk):
Figure FDA0002305642490000051
Sv(rk) The color model weights in (1) are:
Figure FDA0002305642490000052
wherein, the Hfore-F(rk) Representing a super pixel rkThe foreground probability value of (1).
6. The method for video segmentation based on strong target constraint video saliency as claimed in claim 1, wherein said step (5) specifically comprises:
(51) the region saliency is classified into three classes of labels by a threshold definition and histogram segmentation algorithm:
Figure FDA0002305642490000053
for each super pixel region rkSetting label (r)k) Expressed as background, unknown and foreground regions using the values 0, 1 and 2, respectively; sv(rk) Representing a video frame saliency result; setting a defined threshold TbasicWhen the super pixel rkIs less than TbasicThen r iskThe area is a background area; t ishisIs a significant mean value, greater than ThisIs marked as a foreground region; the remaining part is marked as an unknown region;
(52) calculating a prior probability model of the front/background of the current frame:
Θfore=Θfore-1+ρΘfore-S
Θback=Θback-S
wherein rho is a color model weighting coefficient; thetafore-SRepresenting a foreground mixed Gaussian model constructed by the foreground region pixels; thetaback-SRepresenting a background mixed Gaussian model constructed by the pixels in the background area; thetafore-1A Gaussian mixture model representing the foreground of the first frame of the target;
(53) normalizing the foreground probability value Θ of the current framefore-FComprises the following steps:
Figure FDA0002305642490000061
7. the video segmentation method based on strong target constraint video saliency as claimed in claim 1, wherein said spatial-temporal continuous full-connected conditional random field model based on superpixels in step (6) is specifically:
defining random variables
Figure FDA0002305642490000062
Figure FDA0002305642490000063
Representing a super pixel region riThe segmentation tags of (a) are set,
Figure FDA0002305642490000064
0 is background and 1 is foreground; establishing a spatial-temporal continuous full-connection conditional random field model based on superpixels:
Figure FDA0002305642490000065
therein, ΨiIs a data item; ΨijIs an intra smoothing term; phiijIs an inter-frame smoothing term; m and N respectively represent the total number of superpixels in the current frame and the adjacent frame;
super pixel riForeground probability of
Figure FDA0002305642490000066
Comprises the following steps:
Figure FDA0002305642490000067
wherein, thetafore-F(qo) Representing the current frame pixel qoA foreground probability value of (d); super pixel riThe number of pixels in the region is O;
data item ΨiComprises the following steps:
Figure FDA0002305642490000068
intra smoothing term definition ΨijComprises the following steps:
Ψij(ri,rj)=w1fdis(ri,rj)+w2Dr(ri,rj)+w3fedge(ri,rj)i,j∈M,
wherein f isdis(ri,rj) Representing a super pixel ri,rjA physical distance feature therebetween; dr(ri,rj) Representing a super pixel ri,rjA color distance feature therebetween; f. ofedge(ri,rj) Representing a super pixel ri,rjEdge features in between; w is a1、w2And w3Respectively represents fdis(ri,rj)、Dr(ri,rj) And fedge(ri,rj) The proportion occupied in the intraframe smoothing term;
inter-frame smoothing term ΦijComprises the following steps:
Figure FDA0002305642490000071
Figure FDA0002305642490000072
wherein the content of the first and second substances,
Figure FDA0002305642490000073
is a neighboring frame region rjα is a time domain connection distance characteristic parameter.
CN201710946156.2A 2017-09-30 2017-09-30 Video segmentation method based on strong target constraint video saliency Expired - Fee Related CN107644429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710946156.2A CN107644429B (en) 2017-09-30 2017-09-30 Video segmentation method based on strong target constraint video saliency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710946156.2A CN107644429B (en) 2017-09-30 2017-09-30 Video segmentation method based on strong target constraint video saliency

Publications (2)

Publication Number Publication Date
CN107644429A CN107644429A (en) 2018-01-30
CN107644429B true CN107644429B (en) 2020-05-19

Family

ID=61113582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710946156.2A Expired - Fee Related CN107644429B (en) 2017-09-30 2017-09-30 Video segmentation method based on strong target constraint video saliency

Country Status (1)

Country Link
CN (1) CN107644429B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108550159B (en) * 2018-03-08 2022-02-15 佛山市云米电器科技有限公司 Flue gas concentration identification method based on image three-color segmentation
CN108898618B (en) * 2018-06-06 2021-09-24 上海交通大学 Weak surveillance video object segmentation method and device
CN110830846B (en) * 2018-08-07 2022-02-22 阿里巴巴(中国)有限公司 Video clipping method and server
CN109447082B (en) * 2018-08-31 2020-09-15 武汉尺子科技有限公司 Scene moving object segmentation method, system, storage medium and equipment
CN109345484A (en) * 2018-09-30 2019-02-15 北京邮电大学 A kind of depth map restorative procedure and device
CN109615640B (en) * 2018-11-19 2021-04-30 北京陌上花科技有限公司 Related filtering target tracking method and device
CN109829449B (en) * 2019-03-08 2021-09-14 北京工业大学 RGB-D indoor scene labeling method based on super-pixel space-time context
CN109978891A (en) * 2019-03-13 2019-07-05 浙江商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110148158A (en) * 2019-05-13 2019-08-20 北京百度网讯科技有限公司 For handling the method, apparatus, equipment and storage medium of video
CN110163873B (en) * 2019-05-20 2023-02-24 长沙理工大学 Bilateral video target segmentation method and system
CN110176027B (en) * 2019-05-27 2023-03-14 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN110278415B (en) * 2019-07-02 2020-04-28 浙江大学 Method for improving video quality of network camera
CN110390293B (en) * 2019-07-18 2023-04-25 南京信息工程大学 Video object segmentation algorithm based on high-order energy constraint
CN110532922B (en) * 2019-08-21 2023-04-14 成都电科慧安科技有限公司 Method for real-time segmentation of depth map video frames on mobile device
CN111275696B (en) * 2020-02-10 2023-09-15 腾讯医疗健康(深圳)有限公司 Medical image processing method, image processing method and device
CN111340852B (en) * 2020-03-10 2022-09-27 南昌航空大学 Image sequence optical flow calculation method based on optimized semantic segmentation
CN111539993B (en) * 2020-04-13 2021-10-19 中国人民解放军军事科学院国防科技创新研究院 Space target visual tracking method based on segmentation
CN112085760B (en) * 2020-09-04 2024-04-26 厦门大学 Foreground segmentation method for laparoscopic surgery video
CN113297990B (en) * 2021-05-28 2023-03-14 西安理工大学 Human foot moving object detection method based on Gaussian mask light stream
CN114494297B (en) * 2022-01-28 2022-12-06 杭州电子科技大学 Adaptive video target segmentation method for processing multiple priori knowledge
CN115082866B (en) * 2022-08-19 2022-11-29 江苏南通二建集团讯腾云创智能科技有限公司 Intelligent fire-fighting fire identification method for building
CN116309565B (en) * 2023-05-17 2023-08-04 山东晨光胶带有限公司 High-strength conveyor belt deviation detection method based on computer vision
CN116342629A (en) * 2023-06-01 2023-06-27 深圳思谋信息科技有限公司 Image interaction segmentation method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709472A (en) * 2017-01-17 2017-05-24 湖南优象科技有限公司 Video target detecting and tracking method based on optical flow features
CN106780376A (en) * 2016-12-07 2017-05-31 中国农业科学院农业信息研究所 The background image dividing method of partitioning algorithm is detected and combined based on conspicuousness
CN106952286A (en) * 2017-03-21 2017-07-14 中国人民解放***箭军工程大学 Dynamic background Target Segmentation method based on motion notable figure and light stream vector analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345644B (en) * 2013-06-17 2016-08-24 华为终端有限公司 The object detection method of on-line training and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780376A (en) * 2016-12-07 2017-05-31 中国农业科学院农业信息研究所 The background image dividing method of partitioning algorithm is detected and combined based on conspicuousness
CN106709472A (en) * 2017-01-17 2017-05-24 湖南优象科技有限公司 Video target detecting and tracking method based on optical flow features
CN106952286A (en) * 2017-03-21 2017-07-14 中国人民解放***箭军工程大学 Dynamic background Target Segmentation method based on motion notable figure and light stream vector analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Video Segmentation Based on Strong Target Constrained Video Saliency;Long Zhang et al.;《2017 2nd International Conference on Image, Vision and Computing》;20170720;正文第356-358页 *

Also Published As

Publication number Publication date
CN107644429A (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN107644429B (en) Video segmentation method based on strong target constraint video saliency
CN109241913B (en) Ship detection method and system combining significance detection and deep learning
US8280165B2 (en) System and method for segmenting foreground and background in a video
CN110688987B (en) Pedestrian position detection and tracking method and system
CN107273905B (en) Target active contour tracking method combined with motion information
CN105184763B (en) Image processing method and device
CN111539273A (en) Traffic video background modeling method and system
CN106570486A (en) Kernel correlation filtering target tracking method based on feature fusion and Bayesian classification
CN107968946B (en) Video frame rate improving method and device
US10249046B2 (en) Method and apparatus for object tracking and segmentation via background tracking
CN105184802B (en) A kind of method and device of image procossing
CN113362341B (en) Air-ground infrared target tracking data set labeling method based on super-pixel structure constraint
CN111161313A (en) Multi-target tracking method and device in video stream
CN107507223A (en) Method for tracking target based on multi-characters clusterl matching under dynamic environment
CN107871315B (en) Video image motion detection method and device
CN115439803A (en) Smoke optical flow identification method based on deep learning model
CN110111239B (en) Human image head background blurring method based on tof camera soft segmentation
CN105631405A (en) Multistage blocking-based intelligent traffic video recognition background modeling method
CN103337082B (en) Methods of video segmentation based on Statistical Shape priori
JP3716455B2 (en) Region extraction method and region extraction device
CN108573217B (en) Compression tracking method combined with local structured information
Ghosh et al. Robust simultaneous registration and segmentation with sparse error reconstruction
Chen et al. Illumination-invariant video cut-out using octagon sensitive optimization
CN115409954A (en) Dense point cloud map construction method based on ORB feature points
Jabid et al. An edge-texture based moving object detection for video content based application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200519

Termination date: 20200930

CF01 Termination of patent right due to non-payment of annual fee