US20140241582A1 - Digital processing method and system for determination of object occlusion in an image sequence - Google Patents

Digital processing method and system for determination of object occlusion in an image sequence Download PDF

Info

Publication number
US20140241582A1
US20140241582A1 US14/217,655 US201414217655A US2014241582A1 US 20140241582 A1 US20140241582 A1 US 20140241582A1 US 201414217655 A US201414217655 A US 201414217655A US 2014241582 A1 US2014241582 A1 US 2014241582A1
Authority
US
United States
Prior art keywords
image
occlusion map
motion
occlusion
regularized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/217,655
Other versions
US8831288B1 (en
Inventor
William L. Gaddy
Vidhya Seran
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHEYTEC TECHNOLOGIES, LLC
Original Assignee
Spinella IP Holdings Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spinella IP Holdings Inc filed Critical Spinella IP Holdings Inc
Priority to US14/217,655 priority Critical patent/US8831288B1/en
Assigned to SPINELLA IP HOLDINGS, INC. reassignment SPINELLA IP HOLDINGS, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE APPLICATION NUMBER FROM "14/063,704" TO "14/217,655" PREVIOUSLY RECORDED ON REEL 033128 FRAME 0513. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: GADDY, WILLIAM L., SERAN, VIDHYA
Assigned to A2ZLOGIX, INC. reassignment A2ZLOGIX, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SPINELLA IP HOLDINGS, INC.
Publication of US20140241582A1 publication Critical patent/US20140241582A1/en
Application granted granted Critical
Publication of US8831288B1 publication Critical patent/US8831288B1/en
Assigned to CHEYTEC TECHNOLOGIES, LLC reassignment CHEYTEC TECHNOLOGIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: A2ZLOGIX, INC.
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06K9/00624
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images

Definitions

  • the present invention relates generally to digital image processing, and more particularly, to a method and system for automatic determination of the presence, location, and type of occlusion regions between a pair of images.
  • FIG. 1 shows an example of an image pair 100 a , 100 b , with background 105 and foreground 110 , where a foreground object 115 is in motion and which shows an occlusion region 120 and a disocclusion region 125 .
  • a foreground object 115 is in motion in a video sequence
  • background pixels of the image 100 b in the forward-motion direction are hidden (known herein as occlusion or the occlusion region 120 ) while background pixels of the image 100 b behind the motion are revealed (known herein as disocclusion or the disocclusion region 125 ).
  • occlusion regions In the occluded areas of an image, there is no definite motion attributable to the background; concomitantly, there is no definite motion attributable to the foreground object in disoccluded regions of the image.
  • occlusion regions These two types of areas within a pair of images (collectively known herein as occlusion regions) are very problematic for motion estimation in general, and for many optical flow systems in particular, because erroneous motion vector values in these regions tend to propagate into non-occlusion regions, adversely affecting the overall accuracy of the optical flow estimation. Determination of occlusion regions has many benefits for other high-value video analysis tasks in addition to improvement of optical flow and motion estimation, such as disparity and depth estimation, image segmentation, object identification, and 3D conversion and projection.
  • Occlusion has received much attention in the context of motion estimation, depth estimation and image/video segmentation.
  • Occlusion can be estimated or computed explicitly or implicitly.
  • Occlusion boundaries themselves provide strong cues for 3D scene reconstruction. Methods as described in A. Saxena, M. Sun, and A. Y. Ng, “Make 3D: Learning 3D Scene structure form a Single Image,” PAMI, 31: 824-840, 2009, and in D. Hoiem, A. A. Efros, and A. Hebert, “Recovering Occlusion Boundaries from an Image,” International Journal on Computer Vision, pages 1-19, 2010, propose to find occlusion boundaries using a single frame by over-segmentation and supervised-learning.
  • occlusion boundary detection is an inherently ambiguous problem.
  • Other methods attempt to layer input video into flexible sprites to infer occluded pixels/regions (see e.g., N. Jojic and B. J. Frey, “Learning Flexible Sprites in Video layers,” in CVPR, 2001).
  • Layered methods provide realistic modeling of occlusion boundaries, but these methods need to have continuous regions, relative order of surfaces, and predetermined motion.
  • the method described in Sun, D., Sudderth, E. B., Black, M. J. “Layered image motion with explicit occlusions, temporal consistency, and depth ordering,” in: Advances in Neural Information Processing Systems, pp.
  • Alvarez et al., “Symmetrical dense optical flow estimation with occlusions detection,” International Journal of Computer Vision 75(3), 371-385 (2007), (hereinafter, Alvarez), passing interest is focused on the role of the diffusion tensor and subsequent eigenvalue analysis, but this is only used to analyze the forward and backward symmetry of the optical flow solution, and not used to directly improve the accuracy of either the optical flow computation nor the occlusion computation.
  • Ince S., Konrad, J., “Occlusion-aware optical flow estimation,” IEEE Trans. Image Processing 17(8), 1443-1451 (2008), (hereinafter, “Ince”), discloses a method and systems for joint determination of optical flow and occlusion, but the systems are coupled and this method is not applicable for coupling to a non-optical-flow motion estimation system, such as block matching. Further, Ince ignores the notion of either a diffusion tensor or structure tensor of the images in order to improve robustness.
  • Motion cues are very important for identifying occlusion regions and boundaries.
  • the objective of any motion estimation is to compute a flow field that represents the motion of points in two consecutive frames, and the most accurate motion estimation techniques should be able to handle occlusions.
  • Some occlusion detection work based on motion as described in Alvarez and Ince jointly estimates backward and forward motion and marks inconsistent pixels as occluded regions. In such circumstances, occlusion is detected implicitly and the occlusion detection is coupled with the motion estimation method itself. These methods encounter problems within highly textured imagery areas and do not succeed with large displacements or occlusion regions.
  • a processing device receives a first image and a second image.
  • the processing device estimates a field of motion vectors between the first image and the second image.
  • the processing device motion compensates the first image toward the second image to obtain a motion-compensated image.
  • the processing device compares a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field.
  • the processing device inputs the error field to a weighted error cost function to obtain an initial occlusion map.
  • the processing device regularizes the initial occlusion map to obtain a regularized occlusion map.
  • regularizing may further comprise obtaining a regularized error field.
  • comparing and regularizing may be repeated until a value based on at least one of the regularized occlusion map or the regularized error field is below a threshold value.
  • motion compensating the first image toward the second image comprises image warping the field of motion vectors from the first image toward the second image.
  • the initial occlusion map and the regularized occlusion map are each based on a weighted error cost function.
  • the weighted error cost function may be at least one of a sum-of-square differences measure, a locally scaled sum-of-square differences measure, a normalized cross-correlation measure, or a zero-mean normalized cross-correlation measure.
  • the weighted error cost function may be based on a local weighting over a local region of support.
  • the local weighting over a local region of support may be based on an eigensystem analysis of the local structure tensor of the motion-compensated image.
  • the local weighting over a local region of support is a gradient-energy weighting over the local region of support.
  • the gradient-energy weighting over a local region of support may be a sum of statistical variance or local contrast over the local region of support.
  • regularizing the occlusion map to obtain a regularized occlusion map may comprise applying a multi-sigma regularization to the occlusion map.
  • Applying a multi-sigma regularization to the occlusion map may comprise applying a 4-factor sigma filter to the occlusion map.
  • Input weights for the multi-factor sigma filter may comprise an initial coarse occlusion field estimate and, between the first image and the second image one or more of similarities of color value or luminance, similarities of circular values of motion vector directions, or similarities of motion vector magnitudes.
  • the multi-factor sigma filter may incorporate one or more weights, such as depth, or discontinuities of a range-to-target field.
  • Examples of the present disclosure provide a method and system for detecting and characterizing occlusion regions without any assumptions that depend on scene types, motion types, or supervised learning datasets. Examples of the present disclosure provide an accurate and precise occlusion region map. The occlusion detection is decoupled from the motion estimation itself, providing for flexible addition to any suitable optical flow or motion estimation system or method.
  • FIG. 1 shows an example of an image pair with background and foreground, where the foreground object is in motion and shows occlusion and disocclusion regions.
  • FIG. 2 is a block diagram of an example computing system for detecting one or more occlusion regions in an image sequence, in which examples of the present disclosure may operate.
  • FIG. 3 shows an exemplary occlusion field/map.
  • FIG. 4 is a flow diagram illustrating an example of a method for detecting occlusion regions and/or disocclusion regions in a sequence of images using the computing system of FIG. 2 .
  • FIG. 5 is a block diagram of an example data flow between modules that implement the method of FIG. 4 .
  • FIG. 6 is a block diagram of an example data flow between modules that implement a weighted distance field module of FIG. 5 .
  • FIG. 7 is a block diagram of an example data flow through a 4-factor sigma filter employed in a regularization module of FIG. 5 .
  • FIG. 8 illustrates a diagrammatic representation of a machine in the example form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • Motion estimation has been extensively explored in the related art and a determination of accurate motion vectors between images is still a challenging problem.
  • Several solutions have historically been used for motion estimation, such as simple block matching, hierarchical block matching, and optical flow estimation.
  • Occlusion detection is also very important in estimating a disparity map for stereo matching.
  • Occlusion marking has independent applications beyond motion estimation and disparity map estimation, such as in video surveillance object sorting/object removal and re-projection of multi-view video and imagery. Also, marking occlusions is very useful for image segmentation, motion segmentation, image in-painting, and disparity in-painting.
  • Occlusion occurs under one of the following conditions: a camera or capture system is zoomed in/out, a new object appears in-frame, or old objects disappear from frame, and moving foreground objects reveal background pixels. For example, in stereoscopic image pairs, two images are captured from different angles and some pixels exist in only one view. As used herein, these pixels are known as occluded pixels. Similarly, in video surveillance, a person moving away from a camera or a new person appearing in front of the camera introduces occlusion.
  • FIG. 2 is a block diagram of an example computing system 200 for detecting one or more occlusion regions in an image sequence, in which examples of the present disclosure may operate.
  • the computing system 200 receives data from one or more data sources 205 , such as a video camera or a still camera or an on-line storage device or transmission medium.
  • the computing system 200 may also include a digital video capture system 210 and a computing platform 215 .
  • the digital video capturing system 210 processes streams of digital video, or converts analog video to digital video, to a form which can be processed by the computing platform 215 as data source 205 .
  • the computing platform 215 comprises a host system 220 which may comprise, for example, a processing device 225 , such as one or more central processing units 230 a - 230 n .
  • the processing device 225 is coupled to a host memory 235 .
  • the processing device may further implement a graphics processing unit 240 (GPU).
  • GPU graphics processing unit
  • other co-processor architectures may be utilized besides GPUs, such as, but not limited to, DSPs, FPGAs, or ASICs, or adjunct fixed-function features of the processing device 225 itself.
  • the GPU 240 may be collocated on the same physical chip or logical device as the central processing units 230 a - 230 n , also known as an “APU”, such as found on mobile phones and tablets. Separate GPU and CPU functions may be found on computer server systems where the GPU is a physical expansion card, and personal computer systems and laptops.
  • the GPU 240 may comprise a GPU memory 237 .
  • the host memory 235 and GPU memory 237 may also be collocated on the same physical chip(s) or logical device, such as on an APU.
  • the processing device 225 is configured to implement an occlusion map generator 245 for detecting occlusion regions and/or disocclusion regions in a sequence of images.
  • the occlusion map generator 245 may be configured to receive data (e.g., a first image and a second image) from the data source 205 , and to receive an image data buffer 250 , which is transferred to the GPU memory 237 as image buffer 255 .
  • the processing device 225 may implement the occlusion map generator 245 as a component of the GPU 240 .
  • the occlusion map generator 245 is configured to obtain a regularized occlusion map from the image buffer 255 as shown in FIG. 3 .
  • the totality of occluded regions for a given image is referred to as an occlusion map.
  • the regularized occlusion map may be displayed on a display 270 .
  • the occlusion map generator 245 may transmit the regularized occlusion map to one or more downstream devices 290 directly or through a network 295 .
  • FIG. 4 is a flow diagram illustrating an example of a method 400 for detecting occlusion regions and/or disocclusion regions in a sequence of images.
  • the method 400 may be performed by a computer system 200 of FIG. 2 and may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof.
  • the method 400 is performed by the occlusion map generator 245 of the computing system 200 of FIG. 2 .
  • the occlusion map generator 245 estimates a field of motion vectors between a first image and a second image based on the received data from the image buffer 255 .
  • the occlusion map generator 245 operates on the field of motion vectors to motion compensate the first image toward the second image to obtain a motion-compensated image. In one example, when the occlusion map generator 245 motion compensates the first image toward the second image, the occlusion map generator 245 image warps the field of motion vectors from the first image toward the second image.
  • the occlusion map generator 245 compares a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field.
  • the occlusion map generator 245 inputs the error field to a weighted error cost function to obtain an initial occlusion map.
  • the weighted error cost function may be at least one of a sum-of-square differences measure, a locally scaled sum-of-square differences measure, a normalized cross-correlation measure, or a zero-mean normalized cross-correlation measure.
  • the weighted error cost function may be based on a local weighting over a local region of support.
  • the local weighting over a local region of support may be based on an eigensystem analysis of the local structure tensor of the motion-compensated image.
  • the local weighting over a local region of support may be a gradient-energy weighting over the local region of support.
  • the gradient-energy weighting over a local region of support may be a sum of statistical variance or local contrast over the local region of support.
  • the occlusion map generator 245 regularizes the initial occlusion map to obtain a regularized occlusion map. In one example, when the occlusion map generator 245 regularizes the initial occlusion map, the occlusion map generator 245 may further regularize the error field.
  • regularizing the occlusion map to obtain a regularized occlusion map may comprise applying a multi-sigma regularization to the occlusion map and the error field.
  • applying a multi-sigma regularization to the occlusion map and error field may comprise applying a 4-factor sigma filter to the occlusion map.
  • the occlusion map generator 245 repeats the comparing and the regularizing steps (i.e., repeating steps 430 - 460 ); otherwise, processing terminates.
  • W t->(t-1) denotes the motion-compensated mapping of frame f(x,y,t ⁇ 1) to frame f(x,y,t).
  • Equation (1) holds true for everywhere in the video frame except for the occluded regions o(x,y,t ⁇ 1).
  • the totality of occluded regions for a given image is referred to as the occlusion map.
  • This map represents a gray scale image mask, or alternatively a 2-dimensional matrix of positive values, that in ideal circumstances accurately identifies each pixel as being occluded, where middle range values can either represent variations in confidence, or alternatively degree of transparency of the occluding object(s).
  • the problems addressed are of finding the occluded areas in a sequence of images and to perform regularization of the resulting occlusion map to attain temporal stability and to prevent recursive error propagation.
  • FIG. 5 is a block diagram of one example of data flow through a sequence of modules 500 that comprise the occlusion map generator 245 of FIG. 3 .
  • a motion estimation module 508 estimates motion vectors of a motion vector field 502 from a first frame 504 (e.g., a previous frame 504 ) f(x,y,t ⁇ 1) to a second frame 506 (e.g., an original frame 506 ) frame f(x,y,t).
  • the motion estimation module 508 may implement a motion estimate method which can be, for example, a sub-pixel hierarchical block-based method, optical flow, or recursive disparity estimation for stereoscopic pairs.
  • a motion compensated warping module 510 may apply a motion compensation warping function W t->(t-1) to the motion vector field 502 to obtain a motion warped first frame 512 (e.g., a motion warped previous frame 512 ).
  • a motion compensation warping function W t->(t-1) can be expanded as Eq. 2,
  • the warping method of Eq. 2 can be described as a “scatter” method whereby every pixel of the compensated image is not guaranteed to be visited, or filled in.
  • a signal value By pre-filling the compensated image buffer with a signal value, unvisited regions in the scatter-based warping operation are left with this signal value undisturbed. This, in turn, forms the starting point of the occlusion map 514 .
  • a weighting function block 516 may apply a weighting function to the motion warped first (e.g., previous) frame 512 to obtain an weighting field 513 which informs subsequent processing steps of the weight to be applied to error cost function analysis for each pixel.
  • the weighting field 513 may be stored in a separate data buffer or in the alpha channel or fourth channel of the motion warped first (e.g., previous) frame 512 .
  • the weighting function may comprise a simple identity function, or something more complex such as Eigensystem analysis of the local structure tensor.
  • a weighted error cost function block 518 may apply a weighted error cost function using the weights supplied by the weighting field 513 .
  • Error pixels from the motion warped first (e.g., previous) frame 512 can be calculated from the weighted error cost function and thereby the occluded areas can be further marked while avoiding the areas already marked.
  • e ⁇ ( x , y , t ) ⁇ ( i , j ) ⁇ R ⁇ ⁇ I ⁇ ( i , j , t ) - I ⁇ ( i , j , t ) _ I ⁇ ⁇ ( x , y , t ) _ ⁇ I ⁇ ⁇ ( x + i , y + j , t ) ⁇ Eq . ⁇ 5
  • ZNCC Zero-Mean Normalized Cross Correlation
  • R is the region of support considered for correlation matching and selecting R as 3 ⁇ 3 pixels may be suitable for real-time processing, and 5 ⁇ 5 may be suitable for offline processing. It will be appreciated by those skilled in the art that, over time, larger regions of support may be employed for real-time and offline processing as the underlying system speed and complexity increases.
  • the 3 ⁇ 3 and 5 ⁇ 5 regions-of-support may be provided as examples.
  • Correlation based matching metrics are very computationally expensive, but since motion vectors are already estimated from an external system, the difference metric can be estimated for a smaller region and does not require a search in a larger pixel region of support.
  • an eigensystem analysis can be utilized to provide a more precise and accurate weighting.
  • the methods described in U.S. Pat. No. 8,355,534, incorporated herein by reference, are particularly instructive, but of particular note here is the use of eigenvalues of the gradient structure tensor of the local region of support to determine whether the region is an isotropic, homogeneous region, one containing significant image texture, or one containing a strong contrast edge. Homogeneous and isotropic region image differences would be weighted less, than for example, those in a highly textured region, based upon such a tensor analysis.
  • Optical flow motion vectors and disparity maps commonly use regularization and smoothing steps to smooth discontinuities and outliers and further helps to stabilize the motion vector fields along the temporal axis in case of video. It is noted that occlusion and error fields benefit from separate treatment in the same way with a regularization module 520 , apart from the motion vector field and the image field.
  • weighting function 516 and weighted error cost function 518 may include an eigensystem analysis as depicted in FIG. 6 .
  • a spatio-temporal gradient estimation 630 may be applied to the field of pixels for the previous frame 610 and the current frame 620 as taught in the '534 patent, which results in a two-dimensional gradient field 640 , wherein gradient derivatives may be estimated, for example, in Eq. 9:
  • the gradient field 640 is input into a gradient tensor analysis 650 , where the gradient values are input to a tensor, and the tensor is subjected to eigensystem analysis as in Eq. 10:
  • the Eigenvalues obtained from gradient tensor analysis 650 result in Eigenvalues fields 660 , which identify the Eigenvalues of the local structure tensor for each pixel of the input images 610 and 620 .
  • the two Eigenvalues ⁇ 1 and ⁇ 2 for each and every pixel may influence the weighting function 670 by discounting the error values in regions with high homogeneity (e.g. low ⁇ 1 and ⁇ 2) and low edge dominance (e.g. low ⁇ 1 relative to ⁇ 2 ).
  • the weighted error cost function 680 After computing a weighting field 675 as in region of support weighting 670 , the weighted error cost function 680 computes a weighted error field 690 as described in the weighting function block 516 of FIG. 5 .
  • the previous frame 710 , current frame 720 , the resulting error field 730 , and the initial occlusion map 740 may be input to a multi-factor sigma filter 750 that operates on the initial occlusion map and field 730 and 740 (similar to the well-known two-factor Bilateral Filter as taught in Tomasi et al., “Bilateral filtering for gray and color images,” International Conference on Computer Vision, (1998) pp.
  • an image color distance function RGB/YUV
  • a directional distance function for 2D motion vectors based on the weighted distance field 730 a magnitude distance function for 2D motion vectors based on the weighted distance field 730
  • occlusion as initially marked in initial occlusion map 740 .
  • the multi-factor sigma filter 750 may be implemented such that when spatial smoothing is applied, if a pixel location is marked as occluded, its contribution to the filter bank coefficients may be penalized heavily, which in turn avoids any unwanted distortions at the object boundaries. Additionally, difference data from dissimilar motion regions by virtue of their associated motion vector directions or magnitudes may be penalized if they are not similar.
  • the multi-factor-sigma filter 750 differs from the bilateral filters and their derivatives in many ways, since the original proposed filter used only two parameters: spatial radius and image color difference.
  • a multi-factor sigma filter 750 can be represented in equation form as in Eq. 11:
  • e ′ ⁇ ( x , y , t ) ⁇ ( i , j ) ⁇ ⁇ ⁇ ⁇ g ⁇ ( i - x , j - y , t ) ⁇ r ⁇ ( I ⁇ ( i , j , t ) - I ⁇ ( x , y , t ) ) ⁇ d ⁇ ( mv n ⁇ ( i , j , t ) - mv u ⁇ ( x , y , t ) ) D ⁇ ( mv v ⁇ ( i , j , t ) - mv v ⁇ ( x , y , t ) ) ⁇ o ⁇ ( i , j , t ) ⁇ e ⁇ ( x , y , t ) ⁇ ( i , j )
  • e( ) is the error field 690 for the image
  • o( ) represents the initial occlusion field 685 provided by the warping compensation 510
  • e′( ) is the resultant regularized occlusion map 760
  • g( ) is the Guassian spatial distance function, as in Eq. 12:
  • r( ) of Eq. 11 is the radiosity function, which observes color differences and/or luminance values
  • r( ) of Eq. 12 is a suitable color difference function based on the RGB or YUV values present in an image I, as in Eq. 13:
  • fC( ) of Eq. 13. may transform the RGB or YUV values to an HSV colorspace representation in one example, as in Eq. 14-19:
  • fC ⁇ ( ) a ⁇ [ atan ⁇ ⁇ 2 ⁇ ( H ⁇ ( i , j , t ) ⁇ , H ⁇ ( x , y , t ) ⁇ ) + ⁇ 2 ⁇ ⁇ ] + b ⁇ ( S ⁇ ( i , j , t ) - S ⁇ ( x , y , y ) ) + c ⁇ ( V ⁇ ( i , j , t ) - V ⁇ ( x , y , t ) )
  • a, b and c are user supplied weighting values, which by way of a non-limiting example may be 0.5, 0.5 and 1.0, respectively; and, where function d( ) of Eq. 11 measures the motion vector similarity, which may include for example: a simple magnitude difference measurement function as in Eq. 21:
  • d( ) is a function to measure simple Euclidian distance between motion vectors and where D( ) is a function as in Eq. 22-25, whereby a method to independently evaluate motion vector direction similarities is provided:
  • d ⁇ ( x , y , t ) log ⁇ ⁇ 1.0 + 4.0 ⁇ max ⁇ [ mv ⁇ ( x , y , t ) x 2 + mv ⁇ ( x , y , t ) y 2 , mv ⁇ ( x , y , t - 1 ) x 2 + mv ⁇ ( x , y , t - 1 ) y 2 ] ⁇
  • the regularization of the output occlusion field values o( ) and error field values e′( ) of Eq. 11 is not just using the radius considered, but also includes the difference in motion vectors, the image luminance and the occlusion markings. This excludes the occluded areas from the operation and will not introduce any distortions due to imperfect motion estimation vectors.
  • the error field is now well-conditioned for a simple, consistent thresholding operation, whereby occlusion field pixels corresponding to error field values below a given threshold are marked as non-occlusion in the final occlusion map O( ), while those greater are marked affirmatively as occlusions in the final occlusion map O( ).
  • occlusion field pixels corresponding to error field values below a given threshold are marked as non-occlusion in the final occlusion map O( ), while those greater are marked affirmatively as occlusions in the final occlusion map O( ).
  • O ⁇ ( x , y , t ) ⁇ 0 if ⁇ ⁇ e ⁇ ( x , y , t ) > threshold 1 if ⁇ ⁇ e ⁇ ( x , y , t ) ⁇ threshold
  • FIG. 8 illustrates a diagrammatic representation of a machine in the example form of a computer system 800 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the Internet.
  • the machine may operate in the capacity of a server machine in client-server network environment.
  • the machine may be a personal computer (PC), a set-top box (STB), a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • STB set-top box
  • server a server
  • network router switch or bridge
  • the example computer system 800 includes a processing device (processor) 802 , a main memory 804 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a static memory 806 (e.g., flash memory, static random access memory (SRAM)), and a data storage device 816 , which communicate with each other via a bus 808 .
  • processor processing device
  • main memory 804 e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • static memory 806 e.g., flash memory, static random access memory (SRAM)
  • SRAM static random access memory
  • Processor 802 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processor 802 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets.
  • the processor 802 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • DSP digital signal processor
  • the occlusion map generator 245 shown in FIG. 2 may be executed by processor 802 configured to perform the operations and steps discussed herein.
  • the computer system 800 may further include a network interface device 822 .
  • the computer system 800 also may include a video display unit 810 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse), and a signal generation device 820 (e.g., a speaker).
  • a video display unit 810 e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)
  • an alphanumeric input device 812 e.g., a keyboard
  • a cursor control device 814 e.g., a mouse
  • a signal generation device 820 e.g., a speaker
  • a drive unit 816 may include a computer-readable medium 824 on which is stored one or more sets of instructions (e.g., instructions of the occlusion map generator 245 ) embodying any one or more of the methodologies or functions described herein.
  • the instructions of the occlusion map generator 245 may also reside, completely or at least partially, within the main memory 804 and/or within the processor 802 during execution thereof by the computer system 800 , the main memory 804 and the processor 802 also constituting computer-readable media.
  • the instructions of the occlusion map generator 245 may further be transmitted or received over a network via the network interface device 822 .
  • While the computer-readable storage medium 824 is shown in an example to be a single medium, the term “computer-readable storage medium” should be taken to include a single non-transitory medium or multiple non-transitory media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term “computer-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
  • the term “computer-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
  • Examples of the disclosure also relate to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • the high-throughput system and method as disclosed herein that improves the perceptual quality and/or the transmission or storage efficiency of existing image and video compression or transmission systems and methods solves problems in many fields, such as real-time efficiency for over-the-top video delivery, cost-effective real-time reduction of public radio-access-network congestion when both uploading and downloading video and image data from mobile devices, increased real-time pass-band television delivery capacity, increase of satellite transponder capacity, reduction of storage costs for content management systems and network DVR architectures, and high-throughput treatment of images and video at the distribution network core as but a few examples.
  • Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
  • a computer readable storage medium such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

A method and system for occlusion region detection and measurement between a pair of images are disclosed. A processing device receives a first image and a second image. The processing device estimates a field of motion vectors between the first image and the second image. The processing device motion compensates the first image toward the second image to obtain a motion-compensated image. The processing device compares a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field. The processing device inputs the error field to a weighted error cost function to obtain an initial occlusion map. The processing device regularizes the initial occlusion map to obtain a regularized occlusion map.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of and claims the benefit of U.S. utility patent application Ser. No. 14/065,704 filed Oct. 29, 2013, which claims the benefit of U.S. provisional patent application No. 61/769,311 filed Feb. 26, 2013, the disclosures of which are incorporated herein by reference in their entirety.
  • TECHNICAL FIELD
  • The present invention relates generally to digital image processing, and more particularly, to a method and system for automatic determination of the presence, location, and type of occlusion regions between a pair of images.
  • BACKGROUND
  • Determining an optical flow or motion vector field between two images, particularly for sequences of video frames and/or fields, is frequently encountered in many high-value video processing tasks such as coding, frame rate conversion, noise reduction, etc. Conventional methods for calculating optical flow encounter several stumbling blocks—many solutions of which are described in U.S. Pat. No. 8,355,534 (hereinafter, “the '534 patent”), incorporated herein by reference in its entirety. As taught in the '534 patent, object occlusion presents a challenge for any motion estimation system, such as an optical flow estimation system.
  • FIG. 1 shows an example of an image pair 100 a, 100 b, with background 105 and foreground 110, where a foreground object 115 is in motion and which shows an occlusion region 120 and a disocclusion region 125. When the foreground object 115 is in motion in a video sequence, background pixels of the image 100 b in the forward-motion direction are hidden (known herein as occlusion or the occlusion region 120) while background pixels of the image 100 b behind the motion are revealed (known herein as disocclusion or the disocclusion region 125). In the occluded areas of an image, there is no definite motion attributable to the background; concomitantly, there is no definite motion attributable to the foreground object in disoccluded regions of the image. These two types of areas within a pair of images (collectively known herein as occlusion regions) are very problematic for motion estimation in general, and for many optical flow systems in particular, because erroneous motion vector values in these regions tend to propagate into non-occlusion regions, adversely affecting the overall accuracy of the optical flow estimation. Determination of occlusion regions has many benefits for other high-value video analysis tasks in addition to improvement of optical flow and motion estimation, such as disparity and depth estimation, image segmentation, object identification, and 3D conversion and projection.
  • The detection of occlusion has received much attention in the context of motion estimation, depth estimation and image/video segmentation. Occlusion can be estimated or computed explicitly or implicitly. Occlusion boundaries themselves provide strong cues for 3D scene reconstruction. Methods as described in A. Saxena, M. Sun, and A. Y. Ng, “Make 3D: Learning 3D Scene structure form a Single Image,” PAMI, 31: 824-840, 2009, and in D. Hoiem, A. A. Efros, and A. Hebert, “Recovering Occlusion Boundaries from an Image,” International Journal on Computer Vision, pages 1-19, 2010, propose to find occlusion boundaries using a single frame by over-segmentation and supervised-learning. With no motion information, occlusion boundary detection is an inherently ambiguous problem. Other methods attempt to layer input video into flexible sprites to infer occluded pixels/regions (see e.g., N. Jojic and B. J. Frey, “Learning Flexible Sprites in Video layers,” in CVPR, 2001). Layered methods provide realistic modeling of occlusion boundaries, but these methods need to have continuous regions, relative order of surfaces, and predetermined motion. The method described in Sun, D., Sudderth, E. B., Black, M. J., “Layered image motion with explicit occlusions, temporal consistency, and depth ordering,” in: Advances in Neural Information Processing Systems, pp. 2226-2234 (2010), explicitly models occlusion and the results obtained are relatively accurate, but the method possesses a huge computational load. Finding occlusion regions represents a common problem in multi-view 3D projection and display methods. The most recent researched methods in this area are still prone to gross errors when the background or foreground underlying pixel data in these regions is homogeneous or have no texture information.
  • In Alvarez, et al., “Symmetrical dense optical flow estimation with occlusions detection,” International Journal of Computer Vision 75(3), 371-385 (2007), (hereinafter, Alvarez), passing interest is focused on the role of the diffusion tensor and subsequent eigenvalue analysis, but this is only used to analyze the forward and backward symmetry of the optical flow solution, and not used to directly improve the accuracy of either the optical flow computation nor the occlusion computation.
  • Ince, S., Konrad, J., “Occlusion-aware optical flow estimation,” IEEE Trans. Image Processing 17(8), 1443-1451 (2008), (hereinafter, “Ince”), discloses a method and systems for joint determination of optical flow and occlusion, but the systems are coupled and this method is not applicable for coupling to a non-optical-flow motion estimation system, such as block matching. Further, Ince ignores the notion of either a diffusion tensor or structure tensor of the images in order to improve robustness.
  • Motion cues are very important for identifying occlusion regions and boundaries. As described above, the objective of any motion estimation is to compute a flow field that represents the motion of points in two consecutive frames, and the most accurate motion estimation techniques should be able to handle occlusions. Some occlusion detection work based on motion as described in Alvarez and Ince, jointly estimates backward and forward motion and marks inconsistent pixels as occluded regions. In such circumstances, occlusion is detected implicitly and the occlusion detection is coupled with the motion estimation method itself. These methods encounter problems within highly textured imagery areas and do not succeed with large displacements or occlusion regions.
  • Xiao, et al., “Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection,” Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 211-224, Springer, Heidelberg (2006) discloses another joint method for computing optical flow and occlusion, but its two computations are closely coupled into a joint regularization framework. Further, this method requires multiple iterations for convergence of the disclosed regularization function and is therefore not suitable for real-time computation for contemporaneous video resolutions such as 1080 and 4K.
  • Even the best conventional motion estimation methods with coupled occlusion detection systems suffer from two primary disadvantages. First, these methods are too computationally complex for real-time processing. Second, the occlusion region maps they produce are inherently noisy. Pixels marked as occlusions may frequently be false-positives or false-negatives, rendering their usage in subsequent video processing and analysis tasks challenging or impossible.
  • Accordingly, there is a need for an accurate, precise, low-computational complexity occlusion estimation system and method that in conjunction with a motion estimation system, increases the robustness and accuracy of such a system in the presence of large motions and resulting large occlusion regions.
  • BRIEF SUMMARY OF THE INVENTION
  • The above-described problems are addressed and a technical solution is achieved in the art by providing a method and system for occlusion region detection and measurement between a pair of images. A processing device receives a first image and a second image. The processing device estimates a field of motion vectors between the first image and the second image. The processing device motion compensates the first image toward the second image to obtain a motion-compensated image. The processing device compares a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field. The processing device inputs the error field to a weighted error cost function to obtain an initial occlusion map. The processing device regularizes the initial occlusion map to obtain a regularized occlusion map.
  • In one example, regularizing may further comprise obtaining a regularized error field. In one example, comparing and regularizing may be repeated until a value based on at least one of the regularized occlusion map or the regularized error field is below a threshold value. In one example, motion compensating the first image toward the second image comprises image warping the field of motion vectors from the first image toward the second image. In one example, the initial occlusion map and the regularized occlusion map are each based on a weighted error cost function. The weighted error cost function may be at least one of a sum-of-square differences measure, a locally scaled sum-of-square differences measure, a normalized cross-correlation measure, or a zero-mean normalized cross-correlation measure. The weighted error cost function may be based on a local weighting over a local region of support. In one example, the local weighting over a local region of support may be based on an eigensystem analysis of the local structure tensor of the motion-compensated image. In another example, the local weighting over a local region of support is a gradient-energy weighting over the local region of support. The gradient-energy weighting over a local region of support may be a sum of statistical variance or local contrast over the local region of support.
  • In one example, regularizing the occlusion map to obtain a regularized occlusion map may comprise applying a multi-sigma regularization to the occlusion map. Applying a multi-sigma regularization to the occlusion map may comprise applying a 4-factor sigma filter to the occlusion map. Input weights for the multi-factor sigma filter may comprise an initial coarse occlusion field estimate and, between the first image and the second image one or more of similarities of color value or luminance, similarities of circular values of motion vector directions, or similarities of motion vector magnitudes. The multi-factor sigma filter may incorporate one or more weights, such as depth, or discontinuities of a range-to-target field.
  • Examples of the present disclosure provide a method and system for detecting and characterizing occlusion regions without any assumptions that depend on scene types, motion types, or supervised learning datasets. Examples of the present disclosure provide an accurate and precise occlusion region map. The occlusion detection is decoupled from the motion estimation itself, providing for flexible addition to any suitable optical flow or motion estimation system or method.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows an example of an image pair with background and foreground, where the foreground object is in motion and shows occlusion and disocclusion regions.
  • FIG. 2 is a block diagram of an example computing system for detecting one or more occlusion regions in an image sequence, in which examples of the present disclosure may operate.
  • FIG. 3 shows an exemplary occlusion field/map.
  • FIG. 4 is a flow diagram illustrating an example of a method for detecting occlusion regions and/or disocclusion regions in a sequence of images using the computing system of FIG. 2.
  • FIG. 5 is a block diagram of an example data flow between modules that implement the method of FIG. 4.
  • FIG. 6 is a block diagram of an example data flow between modules that implement a weighted distance field module of FIG. 5.
  • FIG. 7 is a block diagram of an example data flow through a 4-factor sigma filter employed in a regularization module of FIG. 5.
  • FIG. 8 illustrates a diagrammatic representation of a machine in the example form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • DETAILED DESCRIPTION
  • Motion estimation has been extensively explored in the related art and a determination of accurate motion vectors between images is still a challenging problem. Several solutions have historically been used for motion estimation, such as simple block matching, hierarchical block matching, and optical flow estimation. In order to estimate an accurate motion vector field, the occlusion problem needs to be explicitly confronted. Occlusion detection is also very important in estimating a disparity map for stereo matching. Occlusion marking has independent applications beyond motion estimation and disparity map estimation, such as in video surveillance object sorting/object removal and re-projection of multi-view video and imagery. Also, marking occlusions is very useful for image segmentation, motion segmentation, image in-painting, and disparity in-painting.
  • Occlusion occurs under one of the following conditions: a camera or capture system is zoomed in/out, a new object appears in-frame, or old objects disappear from frame, and moving foreground objects reveal background pixels. For example, in stereoscopic image pairs, two images are captured from different angles and some pixels exist in only one view. As used herein, these pixels are known as occluded pixels. Similarly, in video surveillance, a person moving away from a camera or a new person appearing in front of the camera introduces occlusion.
  • FIG. 2 is a block diagram of an example computing system 200 for detecting one or more occlusion regions in an image sequence, in which examples of the present disclosure may operate. By way of non-limiting example, the computing system 200 receives data from one or more data sources 205, such as a video camera or a still camera or an on-line storage device or transmission medium. The computing system 200 may also include a digital video capture system 210 and a computing platform 215. The digital video capturing system 210 processes streams of digital video, or converts analog video to digital video, to a form which can be processed by the computing platform 215 as data source 205. The computing platform 215 comprises a host system 220 which may comprise, for example, a processing device 225, such as one or more central processing units 230 a-230 n. The processing device 225 is coupled to a host memory 235.
  • The processing device may further implement a graphics processing unit 240 (GPU). It will be appreciated by those skilled in the art that other co-processor architectures may be utilized besides GPUs, such as, but not limited to, DSPs, FPGAs, or ASICs, or adjunct fixed-function features of the processing device 225 itself. It will further be appreciated by those skilled in the art that the GPU 240 may be collocated on the same physical chip or logical device as the central processing units 230 a-230 n, also known as an “APU”, such as found on mobile phones and tablets. Separate GPU and CPU functions may be found on computer server systems where the GPU is a physical expansion card, and personal computer systems and laptops. The GPU 240 may comprise a GPU memory 237. It will be appreciated by those skilled in the art that the host memory 235 and GPU memory 237 may also be collocated on the same physical chip(s) or logical device, such as on an APU.
  • The processing device 225 is configured to implement an occlusion map generator 245 for detecting occlusion regions and/or disocclusion regions in a sequence of images. The occlusion map generator 245 may be configured to receive data (e.g., a first image and a second image) from the data source 205, and to receive an image data buffer 250, which is transferred to the GPU memory 237 as image buffer 255. In one example, the processing device 225 may implement the occlusion map generator 245 as a component of the GPU 240. The occlusion map generator 245 is configured to obtain a regularized occlusion map from the image buffer 255 as shown in FIG. 3. As used herein, the totality of occluded regions for a given image is referred to as an occlusion map. In one example, the regularized occlusion map may be displayed on a display 270. In another example, the occlusion map generator 245 may transmit the regularized occlusion map to one or more downstream devices 290 directly or through a network 295.
  • FIG. 4 is a flow diagram illustrating an example of a method 400 for detecting occlusion regions and/or disocclusion regions in a sequence of images. The method 400 may be performed by a computer system 200 of FIG. 2 and may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof. In one example, the method 400 is performed by the occlusion map generator 245 of the computing system 200 of FIG. 2.
  • As shown in FIG. 4, to permit the computing system 200 to generate a regularized occlusion map from a sequence of images, at block 410, the occlusion map generator 245 estimates a field of motion vectors between a first image and a second image based on the received data from the image buffer 255. At block 420, the occlusion map generator 245 operates on the field of motion vectors to motion compensate the first image toward the second image to obtain a motion-compensated image. In one example, when the occlusion map generator 245 motion compensates the first image toward the second image, the occlusion map generator 245 image warps the field of motion vectors from the first image toward the second image.
  • At block 430, the occlusion map generator 245 compares a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field. At block 440, the occlusion map generator 245 inputs the error field to a weighted error cost function to obtain an initial occlusion map. In one example, the weighted error cost function may be at least one of a sum-of-square differences measure, a locally scaled sum-of-square differences measure, a normalized cross-correlation measure, or a zero-mean normalized cross-correlation measure. In one example, the weighted error cost function may be based on a local weighting over a local region of support. The local weighting over a local region of support may be based on an eigensystem analysis of the local structure tensor of the motion-compensated image. In one example, the local weighting over a local region of support may be a gradient-energy weighting over the local region of support. In an example, the gradient-energy weighting over a local region of support may be a sum of statistical variance or local contrast over the local region of support.
  • At block 450, the occlusion map generator 245 regularizes the initial occlusion map to obtain a regularized occlusion map. In one example, when the occlusion map generator 245 regularizes the initial occlusion map, the occlusion map generator 245 may further regularize the error field.
  • In one example, regularizing the occlusion map to obtain a regularized occlusion map may comprise applying a multi-sigma regularization to the occlusion map and the error field. In one example, applying a multi-sigma regularization to the occlusion map and error field may comprise applying a 4-factor sigma filter to the occlusion map.
  • At block 460, if a value based on at least one of the regularized occlusion map or the regularized error field is above a threshold value, the occlusion map generator 245 repeats the comparing and the regularizing steps (i.e., repeating steps 430-460); otherwise, processing terminates.
  • Let f(x,y,t) be the current frame denotes frame and f(x,y, t−1) be the previous frame. The relationship between f(x,y,t) and f(x,y,t−1) can be denoted as Eq. 1:
  • f ^ ( x , y , t ) { W t - 1 = > t ( f ( x , y , t - 1 ) o ( x , y , t ) Eq . 1
  • where Wt->(t-1) denotes the motion-compensated mapping of frame f(x,y,t−1) to frame f(x,y,t).
  • Thus the operator gives a per-pixel mapping between two frames and this is applicable to any motion model and estimating disparity map. Equation (1) holds true for everywhere in the video frame except for the occluded regions o(x,y,t−1). The totality of occluded regions for a given image is referred to as the occlusion map. This map represents a gray scale image mask, or alternatively a 2-dimensional matrix of positive values, that in ideal circumstances accurately identifies each pixel as being occluded, where middle range values can either represent variations in confidence, or alternatively degree of transparency of the occluding object(s). In the present disclosure, the problems addressed are of finding the occluded areas in a sequence of images and to perform regularization of the resulting occlusion map to attain temporal stability and to prevent recursive error propagation.
  • FIG. 5 is a block diagram of one example of data flow through a sequence of modules 500 that comprise the occlusion map generator 245 of FIG. 3. A motion estimation module 508 estimates motion vectors of a motion vector field 502 from a first frame 504 (e.g., a previous frame 504) f(x,y,t−1) to a second frame 506 (e.g., an original frame 506) frame f(x,y,t). The motion estimation module 508 may implement a motion estimate method which can be, for example, a sub-pixel hierarchical block-based method, optical flow, or recursive disparity estimation for stereoscopic pairs. Based on the motion vectors mvu(x,y, t−1) and mvv(x,y, t−1) calculated, a motion compensated warping module 510 may apply a motion compensation warping function Wt->(t-1) to the motion vector field 502 to obtain a motion warped first frame 512 (e.g., a motion warped previous frame 512). In an example, the function Wt->(t-1) can be expanded as Eq. 2,

  • (x+mv u(x,y,t−1),y+mv v(x,y,t−1),t)≈(x,y,t−1)  Eq. 2
  • It will be appreciated by those skilled in the art that any number of motion compensation regimes may be employed, and the above example is provided for clarity of explanation. The occluded regions will not have true motion vectors since no information was available for the motion estimation system. An ideal motion estimation system may populate these areas with zero magnitude motion vectors or at least signal a lack of confidence in their accuracy by an out-of-band method such as by communicating a confidence map.
  • It is important to distinguish the warping method described with typical motion compensation regimes. In most motion compensation systems, a “gather” method is applied, such that:

  • (x,y,t)≈(x+mv u(x,y,t−1),y+mv v(x,y,t−1),t−1)  Eq. 3
  • While the regime as in Eq. 3 ensures that every destination pixel in the compensation will be visited, ensuring a dense image, it also ignores occlusion. By comparison, the warping method of Eq. 2 can be described as a “scatter” method whereby every pixel of the compensated image is not guaranteed to be visited, or filled in. By pre-filling the compensated image buffer with a signal value, unvisited regions in the scatter-based warping operation are left with this signal value undisturbed. This, in turn, forms the starting point of the occlusion map 514.
  • Next, since the second or original (true) f(x,y,t) frame 506 is known, a weighting function block 516 may apply a weighting function to the motion warped first (e.g., previous) frame 512 to obtain an weighting field 513 which informs subsequent processing steps of the weight to be applied to error cost function analysis for each pixel. In an example, the weighting field 513 may be stored in a separate data buffer or in the alpha channel or fourth channel of the motion warped first (e.g., previous) frame 512. The weighting function may comprise a simple identity function, or something more complex such as Eigensystem analysis of the local structure tensor.
  • A weighted error cost function block 518 may apply a weighted error cost function using the weights supplied by the weighting field 513. Error pixels from the motion warped first (e.g., previous) frame 512 can be calculated from the weighted error cost function and thereby the occluded areas can be further marked while avoiding the areas already marked.
  • Nevertheless, special consideration is needed when choosing error cost functions to estimate error, since simple per-pixel differences (known commonly as Sum-of-Absolute-Differences) or un-weighted sum of squared differences may mark false negatives for homogenous regions and objects with low texture and false positives in image regions with strong contrast and edges. The following correlation based similarity measures (see e.g., Nuno Roma, José Santos-Victor, José Tomé, “A Comparative Analysis Of Cross-Correlation Matching Algorithms Using a Pyramidal Resolution Approach,” 2002) for estimating the error pixels for each pixel in a frame are instructive for the purpose of the preferred embodiment, but are not a limiting example:
  • Sum of Squared Differences (SSD),
  • e ( x , y , t ) = ( i , j ) R ( I ( i , j , t ) - I ^ ( x + i , y + j , t ) ) 2 Eq . 4
  • Locally scaled Sum of Squared Differences (LSSD),
  • e ( x , y , t ) = ( i , j ) R I ( i , j , t ) - I ( i , j , t ) _ I ^ ( x , y , t ) _ I ^ ( x + i , y + j , t ) Eq . 5
  • Normalized Cross Correlation (NCC), and
  • e ( x , y , t ) = ( i , j ) R ( I ( i , j , t ) - I ^ ( x + i , y + j , t ) ) 2 ( i , j ) R ( I ( i , j , t ) ) 2 · ( i , j ) R ( I ( x + i , y + j , t ) ) 2 Eq . 6
  • Zero-Mean Normalized Cross Correlation (ZNCC):
  • e ( x , y , t ) = ( i , j ) R ( I ( i , j , t ) - I _ ) · ( I ^ ( x + i , y + j , t ) - I ^ _ ) ( i , j ) R ( I ( i , j , t ) - I _ ) 2 · ( i , j ) R ( I ( x + i + , y + j , t ) - I ^ _ ) 2 Eq . 7
  • For the equations (4)-(7) shown above, R is the region of support considered for correlation matching and selecting R as 3×3 pixels may be suitable for real-time processing, and 5×5 may be suitable for offline processing. It will be appreciated by those skilled in the art that, over time, larger regions of support may be employed for real-time and offline processing as the underlying system speed and complexity increases. The 3×3 and 5×5 regions-of-support may be provided as examples.
  • Correlation based matching metrics are very computationally expensive, but since motion vectors are already estimated from an external system, the difference metric can be estimated for a smaller region and does not require a search in a larger pixel region of support.
  • In addition to the numerical methods for determining local weighting such as global and local mean as disclosed in Eq. 4-7 above, an eigensystem analysis can be utilized to provide a more precise and accurate weighting. The methods described in U.S. Pat. No. 8,355,534, incorporated herein by reference, are particularly instructive, but of particular note here is the use of eigenvalues of the gradient structure tensor of the local region of support to determine whether the region is an isotropic, homogeneous region, one containing significant image texture, or one containing a strong contrast edge. Homogeneous and isotropic region image differences would be weighted less, than for example, those in a highly textured region, based upon such a tensor analysis.
  • Optical flow motion vectors and disparity maps commonly use regularization and smoothing steps to smooth discontinuities and outliers and further helps to stabilize the motion vector fields along the temporal axis in case of video. It is noted that occlusion and error fields benefit from separate treatment in the same way with a regularization module 520, apart from the motion vector field and the image field.
  • To address the problems of noise, false-positives, and false negatives of the prior art in the final resulting occlusion map 514, weighting function 516 and weighted error cost function 518 may include an eigensystem analysis as depicted in FIG. 6. First, a spatio-temporal gradient estimation 630 may be applied to the field of pixels for the previous frame 610 and the current frame 620 as taught in the '534 patent, which results in a two-dimensional gradient field 640, wherein gradient derivatives may be estimated, for example, in Eq. 9:
  • σ xx = n = - 1 n = 1 m = - 1 m = 1 D RGB ( n , m ) x · D RGB ( n , m ) x · weight ( n , m ) σ xy = n = - 1 n = 1 m = - 1 m = 1 D RGB ( n , m ) x · D RGB ( n , m ) y · weight ( n , m ) σ yy = n = - 1 n = 1 m = - 1 m = 1 D RGB ( n , m ) y · D RGB ( n , m ) y · weight ( n , m ) σ xt = n = - 1 n = 1 m = - 1 m = 1 D RGB ( n , m ) x · D RBG ( n , m ) x t · weight ( n , m ) σ yt = n = - 1 n = 1 m = - 1 m = 1 D RGB ( n , m ) y · D RGB ( n , m ) y t · weight ( n , m )
  • The gradient field 640 is input into a gradient tensor analysis 650, where the gradient values are input to a tensor, and the tensor is subjected to eigensystem analysis as in Eq. 10:
  • 2 D_tensor = σ xx σ xy σ xy σ yy
  • The eigensystem analysis of Eq. 10 results in two Eigenvalues λ1 and λ2 for each and every pixel, where the combination of each eigenvalue identifies the local structure of the image surrounding any aforesaid pixel.
  • The Eigenvalues obtained from gradient tensor analysis 650 result in Eigenvalues fields 660, which identify the Eigenvalues of the local structure tensor for each pixel of the input images 610 and 620. The two Eigenvalues λ1 and λ2 for each and every pixel may influence the weighting function 670 by discounting the error values in regions with high homogeneity (e.g. low λ1 and λ2) and low edge dominance (e.g. low λ1 relative to λ2).
  • After computing a weighting field 675 as in region of support weighting 670, the weighted error cost function 680 computes a weighted error field 690 as described in the weighting function block 516 of FIG. 5.
  • Referring to FIG. 7, the previous frame 710, current frame 720, the resulting error field 730, and the initial occlusion map 740 may be input to a multi-factor sigma filter 750 that operates on the initial occlusion map and field 730 and 740 (similar to the well-known two-factor Bilateral Filter as taught in Tomasi et al., “Bilateral filtering for gray and color images,” International Conference on Computer Vision, (1998) pp. 839-846), but which may include four or more sigmas (factors): an image color distance function (RGB/YUV) based upon the previous frame 710, current frame 720, a directional distance function for 2D motion vectors based on the weighted distance field 730, a magnitude distance function for 2D motion vectors based on the weighted distance field 730, and occlusion as initially marked in initial occlusion map 740.
  • The multi-factor sigma filter 750 may be implemented such that when spatial smoothing is applied, if a pixel location is marked as occluded, its contribution to the filter bank coefficients may be penalized heavily, which in turn avoids any unwanted distortions at the object boundaries. Additionally, difference data from dissimilar motion regions by virtue of their associated motion vector directions or magnitudes may be penalized if they are not similar. The multi-factor-sigma filter 750 differs from the bilateral filters and their derivatives in many ways, since the original proposed filter used only two parameters: spatial radius and image color difference. In one example, a multi-factor sigma filter 750 can be represented in equation form as in Eq. 11:
  • e ( x , y , t ) = ( i , j ) Ω g ( i - x , j - y , t ) r ( I ( i , j , t ) - I ( x , y , t ) ) d ( mv n ( i , j , t ) - mv u ( x , y , t ) ) D ( mv v ( i , j , t ) - mv v ( x , y , t ) ) o ( i , j , t ) e ( x , y , t ) ( i , j ) Ω g ( i - x , j - y , t ) r ( I ( i , j , t ) - I ( x , y , t ) d ( mv u ( i , j , t ) - mv u ( x , y , t ) ) D ( mv v ( i , j , t ) - mv v ( x , y , t ) ) o ( i , j , t )
  • where e( ) is the error field 690 for the image, o( ) represents the initial occlusion field 685 provided by the warping compensation 510, and e′( ) is the resultant regularized occlusion map 760; and where g( ) is the Guassian spatial distance function, as in Eq. 12:
  • g ( i - x , j - y , t ) = - 0.5 ( ( i - x ) 2 + ( j - y ) 2 σ s )
  • where r( ) of Eq. 11 is the radiosity function, which observes color differences and/or luminance values, and r( ) of Eq. 12 is a suitable color difference function based on the RGB or YUV values present in an image I, as in Eq. 13:
  • r ( I ( i , j , t ) - I ( x , y , t ) ) = - 0.5 ( ( fC ( I ( i , j , t ) ) - fC ( I ( x , y , t ) ) ) 2 σ i )
  • Where fC( ) of Eq. 13. may transform the RGB or YUV values to an HSV colorspace representation in one example, as in Eq. 14-19:
  • maxRGB = max ( R , G , B ) minRGB = min ( R , G , B ) chroma = maxRGB - minRGB H ( ) = { NaN , if chroma = 0 ( G - B ) chroma mod 6 , if maxRGB = R ( B - R ) chroma + 2 , if maxRGB = G ( R - G ) chroma + 4 if maxRGB = B V ( ) = chroma S ( ) = { 0 , if chroma = 0 chroma V , otherwise
  • and, where function fC( ) measures the color similarity in HSV color space, in one example as in Eq. 20:
  • fC ( ) = a [ atan 2 ( H ( i , j , t ) , H ( x , y , t ) ) + π 2 π ] + b ( S ( i , j , t ) - S ( x , y , y ) ) + c ( V ( i , j , t ) - V ( x , y , t ) )
  • where a, b and c are user supplied weighting values, which by way of a non-limiting example may be 0.5, 0.5 and 1.0, respectively; and, where function d( ) of Eq. 11 measures the motion vector similarity, which may include for example: a simple magnitude difference measurement function as in Eq. 21:
  • d ( mv u ( i , j , t ) - mv u ( x , y , t ) ) = - 0.5 ( ( mv u ( i , j , t ) - mv u ( x , y , t ) ) 2 σ u )
  • where d( ) is a function to measure simple Euclidian distance between motion vectors and where D( ) is a function as in Eq. 22-25, whereby a method to independently evaluate motion vector direction similarities is provided:
  • θ 1 ( x , y , t ) = atan 2 ( mv ( x , y , t ) x , mv ( x , y , t ) y ) + π 2 π θ 2 ( x , y , t ) = atan 2 ( mv ( x , y , t - 1 ) x , mv ( x , y , t - 1 ) y ) + π 2 π
  • As a measure of the directional difference between motion vectors from frame-to-frame, motion vectors that point opposite to each other (180 degrees opposed) are considered most different, as in Eq. 24. Two such vectors would have differences in direction (theta) and magnitude (distance, or D):

  • Δθ(x,y,t)=min[abs(θ2−θ1),abs(θ2−1.0−θ1),abs(θ2+1.0−θ1)]
  • Further, to numerically emphasize the motion vectors' angular differences, the difference of angles is transformed to a logarithmic scale. If motion vectors for a particular spatial location change direction by a great amount, their difference D( ) is computed on a logarithmic weighting scale as in Eq. 25:

  • D(x,y,t)=Δθlog(x,y,t)=1.442695 log(1.0+2.0×Δθ)
  • Then, the magnitude differences d( ) of Eq. 21 of the respective optical flow vectors for each spatial location are transformed to a logarithmic scale. This emphasizes large differences as opposed to small ones. In one example, values for these are as computed as in Eq. 26:
  • d ( x , y , t ) = log { 1.0 + 4.0 × max [ mv ( x , y , t ) x 2 + mv ( x , y , t ) y 2 , mv ( x , y , t - 1 ) x 2 + mv ( x , y , t - 1 ) y 2 ] }
  • The regularization of the output occlusion field values o( ) and error field values e′( ) of Eq. 11 is not just using the radius considered, but also includes the difference in motion vectors, the image luminance and the occlusion markings. This excludes the occluded areas from the operation and will not introduce any distortions due to imperfect motion estimation vectors.
  • In turn, the error field is now well-conditioned for a simple, consistent thresholding operation, whereby occlusion field pixels corresponding to error field values below a given threshold are marked as non-occlusion in the final occlusion map O( ), while those greater are marked affirmatively as occlusions in the final occlusion map O( ). A non-limiting example is provided in Eq. 27:
  • O ( x , y , t ) = { 0 if e ( x , y , t ) > threshold 1 if e ( x , y , t ) threshold
  • wherein computation of function O( ) results in the final refined occlusion map, as shown in FIG. 5, where typical thresholds range from 0.003 to 0.006, by way of a non-limiting example.
  • FIG. 8 illustrates a diagrammatic representation of a machine in the example form of a computer system 800 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In some examples, the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, or the Internet. The machine may operate in the capacity of a server machine in client-server network environment. The machine may be a personal computer (PC), a set-top box (STB), a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • The example computer system 800 includes a processing device (processor) 802, a main memory 804 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM)), a static memory 806 (e.g., flash memory, static random access memory (SRAM)), and a data storage device 816, which communicate with each other via a bus 808.
  • Processor 802 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processor 802 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor 802 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The occlusion map generator 245 shown in FIG. 2 may be executed by processor 802 configured to perform the operations and steps discussed herein.
  • The computer system 800 may further include a network interface device 822. The computer system 800 also may include a video display unit 810 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse), and a signal generation device 820 (e.g., a speaker).
  • A drive unit 816 may include a computer-readable medium 824 on which is stored one or more sets of instructions (e.g., instructions of the occlusion map generator 245) embodying any one or more of the methodologies or functions described herein. The instructions of the occlusion map generator 245 may also reside, completely or at least partially, within the main memory 804 and/or within the processor 802 during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting computer-readable media. The instructions of the occlusion map generator 245 may further be transmitted or received over a network via the network interface device 822.
  • While the computer-readable storage medium 824 is shown in an example to be a single medium, the term “computer-readable storage medium” should be taken to include a single non-transitory medium or multiple non-transitory media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term “computer-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
  • In the above description, numerous details are set forth. It is apparent, however, to one of ordinary skill in the art having the benefit of this disclosure, that examples of the disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the description.
  • Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “receiving”, “writing”, “maintaining”, or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and translates to a new coordinate system the data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • Examples of the disclosure also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. The high-throughput system and method as disclosed herein that improves the perceptual quality and/or the transmission or storage efficiency of existing image and video compression or transmission systems and methods solves problems in many fields, such as real-time efficiency for over-the-top video delivery, cost-effective real-time reduction of public radio-access-network congestion when both uploading and downloading video and image data from mobile devices, increased real-time pass-band television delivery capacity, increase of satellite transponder capacity, reduction of storage costs for content management systems and network DVR architectures, and high-throughput treatment of images and video at the distribution network core as but a few examples.
  • Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
  • The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. Example structure for a variety of these systems appears from the description herein. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
  • It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other examples will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims (25)

What is claimed is:
1. A method for occlusion region detection, comprising:
receiving, at a processing device, a first image and a second image;
estimating, using the processing device, a field of motion vectors between the first image and the second image;
motion compensating, using the processing device, the first image toward the second image to obtain a motion-compensated image;
comparing, using the processing device, a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field;
inputting, using the processing device, the error field to a weighted error cost function to obtain an initial occlusion map; and
regularizing, using the processing device, the initial occlusion map to obtain a regularized occlusion map.
2. The method of claim 1, wherein regularizing further comprises obtaining a regularized error field.
3. The method of claim 2, further comprising repeating said comparing and said regularizing until a value based on at least one of the regularized occlusion map or the regularized error field is below a threshold value.
4. The method of claim 1, wherein motion compensating the first image toward the second image comprises image warping the field of motion vectors from the first image toward the second image.
5. The method of claim 1, wherein the initial occlusion map and the regularized occlusion map are each based on a weighted error cost function.
6. The method of claim 5, wherein the weighted error cost function is at least one of a sum-of-square differences measure, a locally scaled sum-of-square differences measure, a normalized cross-correlation measure, or a zero-mean normalized cross-correlation measure.
7. The method of claim 6, wherein the weighted error cost function is based on a local weighting over a local region of support.
8. The method of claim 7, wherein the local weighting over a local region of support is based on an eigensystem analysis of the local structure tensor of the motion-compensated image.
9. The method of claim 7, wherein the local weighting over a local region of support is a gradient-energy weighting over the local region of support.
10. The method of claim 9, wherein the gradient-energy weighting over a local region of support is a sum of statistical variance or local contrast over the local region of support.
11. The method of claim 1, wherein regularizing the occlusion map to obtain a regularized occlusion map comprises applying a multi-sigma filter to the occlusion map.
12. The method of claim 11, wherein applying a multi-sigma filter to the occlusion map comprises applying a 4-factor sigma filter to the occlusion map.
13. The method of claim 12, wherein input weights for the multi-factor sigma filter comprise an initial coarse occlusion field estimate and, between the first image and the second image one or more of similarities of color value or luminance, similarities of circular values of motion vector directions, or similarities of motion vector magnitudes, and.
14. The method of claim 13, wherein the multi-factor sigma filter incorporates one or more weights, such as depth, or discontinuities of a range-to-target field.
15. A system, comprising:
a memory;
a processing device coupled to and having use of the memory, the processing device to:
receive a first image and a second image;
estimate a field of motion vectors between the first image and the second image;
motion compensate the first image toward the second image to obtain a motion-compensated image;
compare a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field;
input the error field to a weighted error cost function to obtain an initial occlusion map; and
regularize the initial occlusion map to obtain a regularized occlusion map.
16. The system of claim 15, wherein regularizing further comprises obtaining a regularized error field.
17. The system of claim 16, further comprising repeating said comparing and said regularizing until a value based on at least one of the regularized occlusion map or the regularized error field is below a threshold value.
18. A non-transitory computer-readable storage medium including instructions that, when accessed by a processing device, cause the processing device to perform operations comprising:
receiving a first image and a second image;
estimating a field of motion vectors between the first image and the second image;
motion compensating the first image toward the second image to obtain a motion-compensated image;
comparing a plurality of pixel values of the motion-compensated image to a plurality of pixels of the first image to estimate an error field;
inputting the error field to a weighted error cost function to obtain an initial occlusion map; and
regularizing the initial occlusion map to obtain a regularized occlusion map.
19. The non-transitory computer-readable storage medium of claim 18, wherein regularizing further comprises obtaining a regularized error field.
20. The non-transitory computer-readable storage medium of claim 19, further comprising repeating said comparing and said regularizing until a value based on at least one of the regularized occlusion map or the regularized error field is below a threshold value.
21. The non-transitory computer-readable storage medium of claim 18, wherein the initial occlusion map and the regularized occlusion map are each based on a weighted error cost function.
22. The non-transitory computer-readable storage medium of claim 21, wherein the weighted error cost function is based on a local weighting over a local region of support.
23. The non-transitory computer-readable storage medium of claim 22, wherein the local weighting over a local region of support is based on an eigensystem analysis of the local structure tensor of the motion-compensated image.
24. The non-transitory computer-readable storage medium of claim 18, wherein regularizing the occlusion map to obtain a regularized occlusion map comprises applying a multi-sigma filter to the occlusion map.
25. The non-transitory computer-readable storage medium of claim 24, wherein applying a multi-sigma filter to the occlusion map comprises applying a 4-factor sigma filter to the occlusion map.
US14/217,655 2013-02-26 2014-03-18 Digital processing method and system for determination of object occlusion in an image sequence Expired - Fee Related US8831288B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/217,655 US8831288B1 (en) 2013-02-26 2014-03-18 Digital processing method and system for determination of object occlusion in an image sequence

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201361769311P 2013-02-26 2013-02-26
US14/065,704 US8718328B1 (en) 2013-02-26 2013-10-29 Digital processing method and system for determination of object occlusion in an image sequence
US14/217,655 US8831288B1 (en) 2013-02-26 2014-03-18 Digital processing method and system for determination of object occlusion in an image sequence

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/065,704 Continuation US8718328B1 (en) 2013-02-26 2013-10-29 Digital processing method and system for determination of object occlusion in an image sequence

Publications (2)

Publication Number Publication Date
US20140241582A1 true US20140241582A1 (en) 2014-08-28
US8831288B1 US8831288B1 (en) 2014-09-09

Family

ID=50552869

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/065,704 Expired - Fee Related US8718328B1 (en) 2013-02-26 2013-10-29 Digital processing method and system for determination of object occlusion in an image sequence
US14/217,655 Expired - Fee Related US8831288B1 (en) 2013-02-26 2014-03-18 Digital processing method and system for determination of object occlusion in an image sequence

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/065,704 Expired - Fee Related US8718328B1 (en) 2013-02-26 2013-10-29 Digital processing method and system for determination of object occlusion in an image sequence

Country Status (7)

Country Link
US (2) US8718328B1 (en)
EP (1) EP2962247A4 (en)
JP (1) JP2016508652A (en)
KR (1) KR20150122715A (en)
CN (1) CN105074726A (en)
CA (1) CA2899401A1 (en)
WO (1) WO2014133597A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700432A (en) * 2015-03-24 2015-06-10 银江股份有限公司 Self-adaptive adhered vehicle separating method
CN104867133A (en) * 2015-04-30 2015-08-26 燕山大学 Quick stepped stereo matching method
CN106204597A (en) * 2016-07-13 2016-12-07 西北工业大学 A kind of based on from the VS dividing method walking the Weakly supervised study of formula
US20160381390A1 (en) * 2015-06-25 2016-12-29 Politechnika Poznanska System and a method for disocluded region coding in a multiview video data stream
WO2017131735A1 (en) * 2016-01-29 2017-08-03 Hewlett Packard Enterprise Development Lp Image skew identification
CN107292912A (en) * 2017-05-26 2017-10-24 浙江大学 A kind of light stream method of estimation practised based on multiple dimensioned counter structure chemistry
CN107430782A (en) * 2015-04-23 2017-12-01 奥斯坦多科技公司 Method for being synthesized using the full parallax squeezed light field of depth information
CN107507232A (en) * 2017-07-14 2017-12-22 天津大学 Stereo Matching Algorithm based on multiple dimensioned iteration
CN107798694A (en) * 2017-11-23 2018-03-13 海信集团有限公司 A kind of pixel parallax value calculating method, device and terminal
US20180330470A1 (en) * 2017-05-09 2018-11-15 Adobe Systems Incorporated Digital Media Environment for Removal of Obstructions in a Digital Image Scene
US20180357212A1 (en) * 2017-06-13 2018-12-13 Microsoft Technology Licensing, Llc Detecting occlusion of digital ink
CN110069990A (en) * 2019-03-18 2019-07-30 北京中科慧眼科技有限公司 A kind of height-limiting bar detection method, device and automated driving system
US10423858B2 (en) 2014-07-21 2019-09-24 Ent. Services Development Corporation Lp Radial histogram matching
WO2022271425A1 (en) * 2021-06-23 2022-12-29 Apple Inc. Point-of-view image warp systems and methods

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105074726A (en) * 2013-02-26 2015-11-18 A2Z罗基克斯公司 Determination of object occlusion in an image sequence
CN104735360B (en) * 2013-12-18 2017-12-22 华为技术有限公司 Light field image treating method and apparatus
KR102214934B1 (en) * 2014-07-18 2021-02-10 삼성전자주식회사 Stereo matching apparatus and method using unary confidences learning and pairwise confidences learning
EP2975850A1 (en) * 2014-07-18 2016-01-20 Thomson Licensing Method for correcting motion estimation between at least two frames of a video sequence, corresponding device, computer program and non-transitory computer-readable medium
WO2017151414A1 (en) * 2016-03-02 2017-09-08 Covidien Lp Systems and methods for removing occluding objects in surgical images and/or video
US10277844B2 (en) * 2016-04-20 2019-04-30 Intel Corporation Processing images based on generated motion data
CN106023250B (en) * 2016-05-16 2018-09-07 长春理工大学 A kind of evaluation method of image recognition and target masking intensity in tracking
KR102655949B1 (en) 2018-05-30 2024-04-09 삼성전자주식회사 Face verifying method and apparatus based on 3d image
CN109087332B (en) * 2018-06-11 2022-06-17 西安电子科技大学 Block correlation-based occlusion detection method
WO2020050828A1 (en) * 2018-09-05 2020-03-12 Hewlett-Packard Development Company, L.P. Optical flow maps
CN111275801A (en) * 2018-12-05 2020-06-12 ***通信集团广西有限公司 Three-dimensional picture rendering method and device
CN111462191B (en) * 2020-04-23 2022-07-19 武汉大学 Non-local filter unsupervised optical flow estimation method based on deep learning
US11663772B1 (en) * 2022-01-25 2023-05-30 Tencent America LLC Occluder generation for structures in digital applications
CN114928730B (en) * 2022-06-23 2023-08-22 湖南国科微电子股份有限公司 Image processing method and image processing apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060078180A1 (en) * 2002-12-30 2006-04-13 Berretty Robert-Paul M Video filtering for stereo images
US20070064802A1 (en) * 2005-09-16 2007-03-22 Sony Corporation Adaptive area of influence filter for moving object boundaries
US7408986B2 (en) * 2003-06-13 2008-08-05 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US20110206127A1 (en) * 2010-02-05 2011-08-25 Sensio Technologies Inc. Method and Apparatus of Frame Interpolation
US20120312961A1 (en) * 2011-01-21 2012-12-13 Headwater Partners Ii Llc Setting imaging parameters for image guided radiation treatment
US20130265388A1 (en) * 2012-03-14 2013-10-10 Qualcomm Incorporated Disparity vector construction method for 3d-hevc
US8718328B1 (en) * 2013-02-26 2014-05-06 Spinella Ip Holdings, Inc. Digital processing method and system for determination of object occlusion in an image sequence

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157747A (en) * 1997-08-01 2000-12-05 Microsoft Corporation 3-dimensional image rotation method and apparatus for producing image mosaics
WO2009032255A2 (en) * 2007-09-04 2009-03-12 The Regents Of The University Of California Hierarchical motion vector processing method, software and devices
CN101953167B (en) * 2007-12-20 2013-03-27 高通股份有限公司 Image interpolation with halo reduction
US9626769B2 (en) * 2009-09-04 2017-04-18 Stmicroelectronics International N.V. Digital video encoder system, method, and non-transitory computer-readable medium for tracking object regions
JP4991890B2 (en) * 2010-03-01 2012-08-01 株式会社東芝 Interpolated frame generation apparatus and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060078180A1 (en) * 2002-12-30 2006-04-13 Berretty Robert-Paul M Video filtering for stereo images
US7408986B2 (en) * 2003-06-13 2008-08-05 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US20070064802A1 (en) * 2005-09-16 2007-03-22 Sony Corporation Adaptive area of influence filter for moving object boundaries
US20110206127A1 (en) * 2010-02-05 2011-08-25 Sensio Technologies Inc. Method and Apparatus of Frame Interpolation
US20120312961A1 (en) * 2011-01-21 2012-12-13 Headwater Partners Ii Llc Setting imaging parameters for image guided radiation treatment
US20130265388A1 (en) * 2012-03-14 2013-10-10 Qualcomm Incorporated Disparity vector construction method for 3d-hevc
US8718328B1 (en) * 2013-02-26 2014-05-06 Spinella Ip Holdings, Inc. Digital processing method and system for determination of object occlusion in an image sequence

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423858B2 (en) 2014-07-21 2019-09-24 Ent. Services Development Corporation Lp Radial histogram matching
CN104700432A (en) * 2015-03-24 2015-06-10 银江股份有限公司 Self-adaptive adhered vehicle separating method
CN107430782A (en) * 2015-04-23 2017-12-01 奥斯坦多科技公司 Method for being synthesized using the full parallax squeezed light field of depth information
CN104867133A (en) * 2015-04-30 2015-08-26 燕山大学 Quick stepped stereo matching method
US9992514B2 (en) * 2015-06-25 2018-06-05 Politechnika Poznanska System and a method for disocluded region coding in a multiview video data stream
US20160381390A1 (en) * 2015-06-25 2016-12-29 Politechnika Poznanska System and a method for disocluded region coding in a multiview video data stream
WO2017131735A1 (en) * 2016-01-29 2017-08-03 Hewlett Packard Enterprise Development Lp Image skew identification
CN106204597B (en) * 2016-07-13 2019-01-11 西北工业大学 A kind of video object dividing method based on from the step Weakly supervised study of formula
CN106204597A (en) * 2016-07-13 2016-12-07 西北工业大学 A kind of based on from the VS dividing method walking the Weakly supervised study of formula
US20180330470A1 (en) * 2017-05-09 2018-11-15 Adobe Systems Incorporated Digital Media Environment for Removal of Obstructions in a Digital Image Scene
US10586308B2 (en) * 2017-05-09 2020-03-10 Adobe Inc. Digital media environment for removal of obstructions in a digital image scene
CN107292912A (en) * 2017-05-26 2017-10-24 浙江大学 A kind of light stream method of estimation practised based on multiple dimensioned counter structure chemistry
US20180357212A1 (en) * 2017-06-13 2018-12-13 Microsoft Technology Licensing, Llc Detecting occlusion of digital ink
US11720745B2 (en) * 2017-06-13 2023-08-08 Microsoft Technology Licensing, Llc Detecting occlusion of digital ink
CN107507232A (en) * 2017-07-14 2017-12-22 天津大学 Stereo Matching Algorithm based on multiple dimensioned iteration
CN107798694A (en) * 2017-11-23 2018-03-13 海信集团有限公司 A kind of pixel parallax value calculating method, device and terminal
CN110069990A (en) * 2019-03-18 2019-07-30 北京中科慧眼科技有限公司 A kind of height-limiting bar detection method, device and automated driving system
WO2022271425A1 (en) * 2021-06-23 2022-12-29 Apple Inc. Point-of-view image warp systems and methods
US11989854B2 (en) 2021-06-23 2024-05-21 Apple Inc. Point-of-view image warp systems and methods

Also Published As

Publication number Publication date
JP2016508652A (en) 2016-03-22
CN105074726A (en) 2015-11-18
US8718328B1 (en) 2014-05-06
EP2962247A1 (en) 2016-01-06
EP2962247A4 (en) 2016-09-14
KR20150122715A (en) 2015-11-02
WO2014133597A1 (en) 2014-09-04
US8831288B1 (en) 2014-09-09
CA2899401A1 (en) 2014-09-04

Similar Documents

Publication Publication Date Title
US8831288B1 (en) Digital processing method and system for determination of object occlusion in an image sequence
JP7173772B2 (en) Video processing method and apparatus using depth value estimation
Feng et al. Local background enclosure for RGB-D salient object detection
Pradeep et al. MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera
Lee et al. Local disparity estimation with three-moded cross census and advanced support weight
Zhu et al. Edge-preserving guided filtering based cost aggregation for stereo matching
US20180005039A1 (en) Method and apparatus for generating an initial superpixel label map for an image
Lo et al. Joint trilateral filtering for depth map super-resolution
Zhao et al. Real-time stereo on GPGPU using progressive multi-resolution adaptive windows
US9317928B2 (en) Detecting and tracking point features with primary colors
Jang et al. Discontinuity preserving disparity estimation with occlusion handling
Kok et al. A review on stereo vision algorithm: Challenges and solutions
US9659372B2 (en) Video disparity estimate space-time refinement method and codec
Huang et al. S3: Learnable sparse signal superdensity for guided depth estimation
Zhang et al. Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filtering
Hu et al. IMGTR: Image-triangle based multi-view 3D reconstruction for urban scenes
Kaviani et al. An adaptive patch-based reconstruction scheme for view synthesis by disparity estimation using optical flow
Zhou et al. Stereo matching based on guided filter and segmentation
Stentoumis et al. Implementing an adaptive approach for dense stereo-matching
Pertuz et al. Region-based depth recovery for highly sparse depth maps
Afzal et al. Full 3D reconstruction of non-rigidly deforming objects
Mun et al. Guided image filtering based disparity range control in stereo vision
Matsumoto et al. Real-time enhancement of RGB-D point clouds using piecewise plane fitting
Yang et al. Multiview video depth estimation with spatial-temporal consistency.
Kondermann et al. Postprocessing of optical flows via surface measures and motion inpainting

Legal Events

Date Code Title Description
AS Assignment

Owner name: SPINELLA IP HOLDINGS, INC., NEW JERSEY

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE APPLICATION NUMBER FROM "14/063,704" TO "14/217,655" PREVIOUSLY RECORDED ON REEL 033128 FRAME 0513. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:GADDY, WILLIAM L.;SERAN, VIDHYA;REEL/FRAME:033203/0548

Effective date: 20140131

AS Assignment

Owner name: A2ZLOGIX, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SPINELLA IP HOLDINGS, INC.;REEL/FRAME:033475/0469

Effective date: 20140620

AS Assignment

Owner name: CHEYTEC TECHNOLOGIES, LLC, NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:A2ZLOGIX, INC.;REEL/FRAME:043484/0893

Effective date: 20170816

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20180909