WO2007145654A1 - Automatic compositing of 3d objects in a still frame or series of frames and detection and manipulation of shadows in an image or series of images - Google Patents

Automatic compositing of 3d objects in a still frame or series of frames and detection and manipulation of shadows in an image or series of images Download PDF

Info

Publication number
WO2007145654A1
WO2007145654A1 PCT/US2006/040855 US2006040855W WO2007145654A1 WO 2007145654 A1 WO2007145654 A1 WO 2007145654A1 US 2006040855 W US2006040855 W US 2006040855W WO 2007145654 A1 WO2007145654 A1 WO 2007145654A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
objects
recited
shadow
pixels
Prior art date
Application number
PCT/US2006/040855
Other languages
French (fr)
Inventor
Barton S. Wells
Original Assignee
Aepx Animation, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/262,262 external-priority patent/US7477777B2/en
Priority claimed from US11/271,532 external-priority patent/US7305127B2/en
Application filed by Aepx Animation, Inc. filed Critical Aepx Animation, Inc.
Publication of WO2007145654A1 publication Critical patent/WO2007145654A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/40Hidden part removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised

Definitions

  • the present invention relates to image rendering, and more particularly, this invention relates to automated, accurate compositing of three dimensional (3D) objects added to a two dimensional (2D) frame or series of 2D frames.
  • the present invention also relates to image processing, and more particularly, this invention relates to identification and manipulation of shadows in a still image or series of images.
  • a photograph is typically a single image frame of a real scene. Movies can be described as a series of frames that together form what appears to the human eye to be a continuously moving image. Both photographs and movies are now found in both physical and digital formats.
  • advances in technology have allowed creation of entirely three dimensional worlds.
  • 3D graphic systems are able to produce an image on a two- dimensional screen of a display in such a manner that the image simulates three- dimensional effects.
  • the surface of a 3D object to be represented is separated into a plurality of polygonal surfaces having various arbitrary shapes.
  • Picture data representing the polygonal areas of the 3D object are successively stored in a frame memory having memory locations corresponding to positions on a display screen to accumulate picture data which, when supplied to the display, reconstruct an image which appears to be three-dimensional.
  • the data representing each of the polygonal surfaces must be transformed in order to represent three-dimensional effects such as rotation of the object they represent.
  • the image data for the various polygonal surfaces are produced in succession based on data indicating the depth of each polygonal surface from the plane of the display screen.
  • Conventional 3D systems produce image data representing the polygonal surfaces of an object such that surfaces which cannot be seen from the point of view when displayed are produced first and stored in a display memory and the remaining data representing the polygonal surfaces are successively produced in order according to their depth from the screen. Consequently, image data representing a polygonal surface at the front of the object cover over the image data of reverse surfaces which previously were produced and stored. It is necessary, therefore, to include data indicating the depth of each polygonal surface (referred to as "Z data”) and the order in which the data representing the polygons are produced is determined by reference to such Z data.
  • Z data data indicating the depth of each polygonal surface
  • a Z buffer is provided to store the Z data in pixel units and the stored Z data are compared to determine a display preference.
  • both methods simply overlay the 3D object over the 2D background image.
  • movies which add 3D objects to a background image of a real scene.
  • Current methods render the 3D object in a 3D renderer, and composite the 3D image on the 2D frames of a film. Then, artists must go back and, frame by frame, manually draw shadows and reflections on the 3D object. This is a very time consuming and thus expensive job, considering that a typical movie runs at about 30 frames per second.
  • the 3D object is supposed to be positioned behind something on the frame, present systems require that a user manually create an image mask that is exactly the same size and shape as the 2D object to be shown in front of the 3D object.
  • Another mask, a shadow mask is created by hand for the shadowing created by or cast onto the 3D object. Shadowing is currently performed by dimming the image, which is not an accurate representation of a shadow. Rather, the dimming appears more like a fuzzy area rather than an accurate representation of how the shadow will be cast.
  • the typical method is to manually hand-draw a shadow mask for each frame by using ADOBE® PHOTOSHOP® or other manual graphics program.
  • reflection maps are texture maps that go on the 3D model. This requires the artists to estimate what the scene around the 3D object looks like, and map this onto the 3D model.
  • One problem is that the reflections do not look realistic, particularly on rounded or angled surfaces.
  • Heretofore methods have not been able to accurately create the natural deformation of the reflection due to curvature of the reflecting surface.
  • human artists find it very difficult to conceptualize this deformation and create it in the reflection in the composite image.
  • image processing software is currently used by millions of people worldwide to perform a variety of image processing functions, perhaps the most typical of which is manually changing colors of pixels to make the image appear more realistic. Examples include removal of "red-eye” from human subjects, and touching up of edges.
  • image processing software has heretofore been unable to accurately identify shadows in an image. Nor has it been able to accurately tie shadowed portions of an object to nonshadowed portions of the same object. As such, the ability to work with shadows in image processing software is limited and inaccurate at best. What is therefore needed is a way to automatically and accurately identify shadows in an image, and tie shadowed portions of objects to nonshadowed portions. This would open the door for improved photo processing capabilities, and allow automation of features that now require the manual user input. Such a solution would also reduce the inherent flaws in the effects heretofore manually created by human users. It would also be desirable to accurately remove, lighten, or darken shadows from images.
  • Prior methods of removing or lightening shadows merely lighten the entire image, giving the image a "washed out” look.
  • prior methods of darkening shadows darken the entire image. These methods fail to maintain the integrity of the nonshadowed portion of the image. Accordingly, the user is often required to revert to manual tools to adjust any shadowing.
  • the basic steps performed during creation of an image include analysis of a 2D image frame during a set up sequence, adding a 3D object is added to the frame, and adding visual effects to the frame with little or no user intervention, the visual effects including but not limited to shadowing, reflection, refraction, and transparency.
  • a 2D image is analyzed for determining several of its properties. These properties may include one or more of hue (H), saturation (S), brightness value (V), and red, green and blue (RGB) color intensity.
  • H hue
  • S saturation
  • V brightness value
  • RGB red, green and blue
  • HSVRGB values for each pixel are stored.
  • the image is smoothed and edges in the frame are detected. Lines are refined based on the detected edges.
  • the shadows in the 2D image are detected based on analysis of HSV of interesting pixel and surrounding pixels. Shadows in the frame are matched with the edges to refine and further define the shadows in the frame.
  • Objects in the frame are identified such as by using a flood fill algorithm to find areas outside shadows having similar HSVRG or B values. This completes the setup of the 2D image frame in this embodiment.
  • One or more 3D objects are added to the frame. Occulting is performed to provide the effect of depth.
  • the objects of the 2D image are analyzed to intelligently estimate which objects in the frame are in front of or behind the 3D object based on the position of the object in the frame relative to the 3D object. Since the shape of each 3D object in every direction is known, ray tracing can be used to intelligently add visual effects to the composite image. For example, reflections of one or more of the objects can be rendered on the 3D object, taking into account any curvature of the 3D object. The shadows found are adjusted to be properly cast onto the surface of the 3D object.
  • a representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object may be rendered on the outer surface of the 3D image. This may include effective rendering on the outer surface of the 3D image by merely shading a portion of the background image that would be viewable through the transparent portion of the 3D object. Any portion of the 2D image viewable through a transparent or semi-transparent portion of the 3D object may be adjusted to reflect the effect of refraction caused by the transparent portion of the 3D object. Other effects can also be added. For example, atmospheric effects can be estimated from analysis of the 2D image and added to the 3D object.
  • Manipulation of shadows includes such things as adjustment of shadow boundaries, softening of shadow boundaries, adjustment of shadow properties, and even removal of shadows altogether.
  • a method for identifying pixels in shadow in an image includes analyzing an image for determining several properties thereof, the properties including hue (H), saturation (S), brightness (V), red color content (R), green color content (G), and blue color content (B).
  • a first histogram of H/S values calculated for each pixel in the image is calculated.
  • a second histogram of S/V values calculated for each pixel in the image is also calculated.
  • a line feature in the first histogram is identified.
  • a line feature in the second histogram is identified. Pixels having an H/S value above the line feature in the first histogram and an S/V value above the line feature in the second histogram are marked as being in shadow.
  • a method for manipulating shadows of an image includes analyzing an image for determining several properties thereof. A determination is made as to which pixels of the image are in shadow. A shadow boundary adjustment and/or shadow boundary softening of the image is performed for a group of pixels determined to be in shadow, an outer periphery of the pixels determined to be in shadow defining the shadow boundary.
  • a method for identifying an object in an image includes analyzing an image for determining several properties thereof. Objects in the image are found based at least in part on the properties of the image. A first object determined to be at least primarily in shadow is selected. A second object adjacent the first object is also selected. The two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object.
  • a method for removing shadows from an image includes analyzing an image for determining several properties thereof. Objects in the image are found based at least in part on the properties of the image. A first object determined to be at least primarily in shadow is selected. A second object adjacent the first object is also selected, where the second object is not primarily in shadow. The two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object. New red (R) values are calculated for pixels in the first object based on a ratio of an average R value of the pixels in the first object to an average R value of pixels in the second object.
  • R New red
  • New green (G) values are calculated for pixels in the first object based on a ratio of an average G value of the pixels in the first object to an average G value of pixels in the second object.
  • New blue (B) values are calculated for pixels in the first object based on a ratio of an average B value of the pixels in. the first object to an average B value of pixels in the second object.
  • the invention can be implemented entirely in hardware, entirely in software, or a combination of the two.
  • the invention can also be provided in the form of a computer program product comprising a computer readable medium having computer code thereon.
  • FIG. 1 illustrates a hardware system useable in the context of the present invention.
  • FIG. 2 A is a flow diagram of a high level process of an embodiment of the present invention.
  • FIG. 2B is a flow diagram of a high level process of another embodiment of the present invention.
  • FIG. 3 A is flow diagram of a process according to one embodiment of the present invention.
  • FIG. 3B is flow diagram of a process according to one embodiment of the present invention.
  • FIG. 4 illustrates a mask for neighborhood averaging, used during a smoothing process according to one embodiment of the present invention.
  • FIGS. 5A-B depict Sobel masks, used during an edge detection process according to one embodiment of the present invention.
  • FIG. 6 illustrates processing using a Sobel mask.
  • FIG. 7 depicts a Laplace mask, used during an edge detection process according to one embodiment of the present invention.
  • FIG. 8 illustrates an H/S histogram.
  • FIG. 9 illustrates an S/V histogram.
  • FIG. 10 illustrates a mask used during shadow detection according to one embodiment of the present invention.
  • FIG. 11 depicts a color wheel.
  • FIG. 12A illustrates an image generated by a shadow detection process.
  • FIG. 12B illustrates an image having enhanced shadow definition after a shadow detection process.
  • FIG. 13 depicts a pixel array after shadow edge softening.
  • FIG. 14 depicts a mask used during a shadow softening process.
  • FIG. IS illustrates a portion of an image during a histogram-based object recognition process.
  • FIG. 16 depicts a representation of bins used during a histogram-based object recognition process.
  • FIG. 17 illustrates a representative histogram used during a histogram-based object recognition process.
  • FIG. 18 is flow diagram of a process for determining which objects are in the same overall object and removing shadows.
  • the following description is the best embodiment presently contemplated for carrying out the present invention. This description is made for the purpose of illustrating the general principles of the present invention and is not meant to limit the inventive concepts claimed herein. Further, particular features described herein can be used in combination with other described features in each of the various possible combinations and permutations.
  • the following specification describes systems, methods, and computer program products that provide broadcast-quality photo-realistic rendering of one or more 3D objects added to a still frame (photo mode), or series of frames (movie mode). Effects such as reflection, shadows, transparency, and refraction on or of the 3D object(s) relative to objects in the frame are automatically determined and added to the composite image.
  • the software also dynamically determines what object in the frame should be in front of or behind the 3D object(s) and places those object in front of or behind the 3D object(s), as well as creates realistic reflections of the frame objects on the 3D object(s).
  • the following specification also describes systems, methods, and computer program products that identify and allow manipulation of shadows in a still frame (photo mode), or series of frames (movie mode).
  • Manipulation of shadows includes such things as adjustment of shadow boundaries, softening of shadow boundaries, adjustment of shadow properties, and even removal of shadows altogether.
  • the invention can be implemented entirely in hardware, entirely in software, or a combination of the two.
  • the invention can also be provided in the form of a computer program product comprising a computer readable medium having computer code thereon.
  • a computer readable medium can include any medium capable of storing computer code thereon for use by a computer, including optical media such as read only and writeable CD and DVD, magnetic memory, semiconductor memory (e.g., FLASH memory and other portable memory cards, etc.), etc.
  • a computer for storing and/or executing the code and/or performing the processes described herein can be any type of computing device, including a personal computer (PC), laptop computer, handheld device (e.g., personal digital assistant (PDA)), portable telephone, etc.
  • FIG. 1 illustrates a computer 100 according to one embodiment.
  • the computer 100 includes a system bus 102 to which a processor 104 is coupled.
  • the processor 104 executes instructions found in the code, and controls several of the other components of the computer 100.
  • Memory including Random Access Memory (RAM) 106 and nonvolatile memory 108 (e.g., hard disk drive) store the code or portions thereof, as well as data, during performance of the processes set forth herein.
  • RAM Random Access Memory
  • nonvolatile memory 108 e.g., hard disk drive
  • a graphics rendering subsystem 110 may also be present, and can include a separate graphics processor and additional memory.
  • I/O devices are also present.
  • User input devices such as a keyboard 112 and mouse 114 allow a user to provide user instructions to the computer 100.
  • a monitor 116 or other display device outputs graphical information to the user.
  • the display device can be coupled to the graphics subsystem 110 instead of directly to the bus 102.
  • a network interface 118 may also be provided to allow the computer 100 to connect to remote computing devices for a variety of purposes including data upload, data download, etc.
  • a media reader 120 such as a DVD player or FLASH memory port may be present for reading code from a computer readable medium 122.
  • FIG. 2 A depicts the high level process 200 performed during creation of an image.
  • a 2D image is analyzed during a set up sequence.
  • 3D object data is added to the scene.
  • visual effects including but not limited to shadowing, reflection, refraction, and transparency are calculated for the 3D object and/or 2D image with little or no user intervention.
  • the 3D image is then rendered into the frame with the calculated effects, along with any effects created by the 3D object on the 2D image, thereby creating a realistic, accurate, high- quality composite image.
  • FIG. 3 A illustrates the general process 300 performed by a preferred embodiment of the present invention.
  • the 2D image is analyzed for determining several of its properties. These properties include hue (H), saturation (S), brightness value (V), and red, green and blue (R, G, B) color intensity. H, S, V, R 5 G, and B values for each pixel are stored in a frame buffer or buffers.
  • the image is smoothed.
  • edges in the 2D image are detected.
  • lines are refined based on the detected edges.
  • the shadows in the 2D image are detected, e.g., based on analysis of HSV of interesting pixel and surrounding pixels.
  • shadows in the frame are matched with the edges found in operation 306 to refine and further define the shadows in the frame.
  • objects on the frame are found, e.g., using a flood fill algorithm to find areas outside shadows having similar HSVRGB values. This completes the setup of the 2D image frame.
  • operation 316 data relating to one or more 3D objects are added to the frame.
  • operation 318 the objects identified in operation 314 are analyzed to estimate which objects in the frame are in front of the 3D object based on the position of the object in the frame relative to the 3D object.
  • ray tracing can be used to intelligently add visual effects to the composite image.
  • reflections of one or more of the objects found in operation 314 are rendered on the 3D object, taking into account any curvature of the 3D object.
  • the shadows found in operations 310 and 312 are adjusted to be properly cast onto the surface of the 3D object.
  • a representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object is rendered on the outer surface of the 3D image.
  • operation 324 includes effective rendering on the outer surface of the 3D image by merely shading a portion of the background image that would be viewable through the transparent portion of the 3D object.
  • operation 326 any portion of the 2D image viewable through a transparent or semi-transparent portion of the 3D object is adjusted to reflect the effect of refraction caused by the transparent portion of the 3D object.
  • FIG. 2B depicts a high level process 250 performed during image processing according to one embodiment.
  • an image is analyzed during a set up sequence.
  • shadows in the image are identified.
  • operation 256 the image is processed for refining and/or manipulating the shadows.
  • operation 258 the image is then rendered with the manipulated shadows.
  • the image is analyzed for determining several of its properties. These properties include hue (H), saturation (S), brightness value (V), and red, green and blue (R, G, B) color intensity. H 5 S, V, R, G, and B values for each pixel are stored in a frame buffer or buffers.
  • the image is smoothed and leveled.
  • edges in the image are detected.
  • lines are refined based on the detected edges.
  • the shadows in the image are detected, e.g., based on analysis of HSV of interesting pixel and surrounding pixels.
  • shadows in the image are matched with the edges found in operation 356 to refine and further define the shadow boundaries.
  • shadow edges are softened.
  • objects on the image are found, e.g., using a flood fill algorithm to find areas inside and outside shadows having similar HSVRGB values.
  • objects in shadow and out of shadow are correlated to define overall objects.
  • the shadows can be lightened, darkened, and/or removed.
  • Other embodiments of the present invention may perform only a subset of the foregoing steps and/or additional steps. Further, the order in which the steps for these and other disclosed processes are presented by way of example only, and is in no way meant to require that the present invention perform the steps in the same order presented. Rather, the various steps (or portions thereof) can be performed in any order.
  • the first stage is preparing the image for rendering.
  • An image is loaded into the host system.
  • the image can be, for example, a high quality natural image captured with a digital camera.
  • the image can also be a scanned from a picture, generated from film negatives, etc.
  • the image can further be a purely computer- generated 2D or 3D image.
  • Hue (H) 5 Saturation (S) and Brightness Value (V) of each pixel in the image Hue (H) refers to the relative color of the pixel on the red-green-blue color wheel.
  • Saturation (S) is the degree of color depth. For example, a pure red is completely saturated while pure white is completely non- saturated.
  • the Brightness Value (V) indicates the brightness level of the pixel.
  • the Saturation (S) may be determined as follows. For each pixel, determine which color (RGB) has the highest and lowest brightness (V and Vmin, respectively). RGB brightness values in one example vary from 0 to 255 (256 shades per R, G or B). The value V is set to the largest brightness value from among the three colors, i.e., the highest brightness of the three colors. For instance, a pixel in the sky portion of a landscape image would likely have blue as the brightest color. If so, V is set to the blue brightness value. Likewise, Vmin is set to the lowest brightness value. S can be determined by the following equation:
  • the pixel is black, S can be set to zero.
  • Vred is the brightness value of the red color for that pixel.
  • Gdist green distance
  • Bdist blue distance
  • H Bdist - Gdist .
  • H 240 + (Gdist - Rdis ⁇ x ⁇ O
  • H, S, and V images of the image are created and stored in a frame buffer or buffers.
  • the frame buffer may be a type of RAM 5 nonvolatile memory such as a magnetic disk, etc. If there are several images, such as in a movie mode, each image of the series of images is analyzed, and HSV images are created and stored for each image. R 5 G 5 and B values for each pixel in the image are also stored in a frame buffer or buffers.
  • Leveling data may be created and stored for use in subsequent processing such as object identification.
  • An R, G, B histogram is obtained of the whole image.
  • the number of pixels in each shade are stored in bins representing each component from 0 to 255. For example, if 8,201 pixels have an R shade of 35, the bin for shade 35 will have a value of 8,201.
  • bin 0 is summed with the next bin or bins until the sum > 3906. Supposing bins 0-2 total 4000, then all pixels associated with bins 0-2 are give a first shade. Then bin 3 is added to the next bin or bins until the sum > 3906. Then all pixels associated with bins 3-n are give a second shade. Thus, the color contrast is maximized. This in turn is useful for such things as object recognition using flood fill algorithms, as discussed below.
  • edges in the image are identified.
  • edge detection is performed to detect edges in the image (or frame), as edges tend to denote boundaries between or on objects.
  • images usually have some sort of "noise” or edge distortion due to the inherent irregularities in real-life objects, as well as the limitations in the camera or scanner and as an effect of any compression.
  • An example of image noise in a real-life object is found in concrete with a rock facade, where each rock has an edge and so appears to be its own object.
  • a smoothing process is preferably performed prior to detecting the edges.
  • the purpose of noise smoothing is to reduce various spurious effects of a local nature in the image, caused perhaps by noise in the image acquisition system, or arising as a result of compression of the image, for example as is typically done automatically by consumer-grade digital cameras.
  • the smoothing can be done either by considering the real space image, or its Fourier transform.
  • the simplest smoothing approach is neighborhood averaging, where each pixel is replaced by the average value of the pixels contained in some neighborhood about it.
  • the simplest case is probably to consider the 3x3 group of pixels centered on the given pixel, and to replace the central pixel value by the unweighted average of these nine pixels.
  • the central pixel in the mask 400 of FIG. 4 is replaced by the value 13 (the nearest integer to the average).
  • a more preferable approach is to use a median filter.
  • a neighborhood around the pixel under consideration is used, but this time the pixel value is replaced by the median pixel value in the neighborhood.
  • the 9 pixel values are written in sorted order, and the central pixel is replaced by the fifth highest value. For example, again taking the data shown in FIG. 4, the central pixel is replaced by the value 12.
  • Gaussian smoothing is performed by convolving an image with a Gaussian operator which is defined below.
  • Gaussian smoothing is performed by convolving an image with a Gaussian operator which is defined below.
  • the Gaussian outputs a "weighted average” of each pixel's neighborhood, with the average weighted more towards the value of the central pixels.
  • the Gaussian distribution function in two variables, g(xy), is defined by:
  • is the standard deviation representing the width of the Gaussian distribution.
  • the shape of the distribution and hence the amount of smoothing can be controlled by varying ⁇ .
  • a further way to compute a Gaussian smoothing with a large standard deviation is to convolve an image several times with a smaller Gaussian. While this is computationally complex, it is practical if the processing is carried out using a hardware pipeline.
  • the edges in the smoothed image can be detected.
  • edge detection There are many ways to perform edge detection.
  • the gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image.
  • the Laplacian method searches for zero crossings in the second derivative of the image to find edges.
  • One suitable gradient edge detection algorithm uses the Sobel method.
  • the Sobel operator performs a 2-D spatial gradient measurement on an image. Typically it is used to find the approximate absolute gradient magnitude at each point in an input grayscale image.
  • the Sobel edge detector uses a pair of 3x3 convolution masks, one estimating the gradient in the x-direction (columns) and the other estimating the gradient in the y- direction (rows).
  • a convolution mask is usually much smaller than the actual image. As a result, the mask is slid over the image, manipulating a square of pixels at a time.
  • Illustrative Sobel masks 500, 502 are shown in FIGS. 5 A and 5B, respectively.
  • the magnitude of the gradient is then calculated using the formula:
  • An approximate magnitude can be calculated using:
  • a pixel location is declared an edge location if the value of the gradient exceeds some threshold. As mentioned above, edges will have higher pixel intensity values than those surrounding it. So once a threshold is set, the gradient value can be compared to the threshold value and an edge detected whenever the threshold is exceeded.
  • the mask is positioned over an area of the input image, that pixel's value is changed, and then the mask is shifted one pixel to the right. This sequence continues to the right until it reaches the end of a row. The procedure then continues at the beginning of the next row.
  • the example in FIG. 6 shows the mask 500 being slid over the top left portion of the input image 600 represented by the heavy outline 602.
  • the formula below shows how a particular pixel in the output image 604 can be calculated.
  • the center of the mask is placed over the pixel being manipulated in the image. It is important to notice that pixels in the first and last rows, as well as the first and last columns cannot be manipulated by a 3x3 mask. This is because when placing the center of the mask over a pixel in the first row (for example), the mask will be outside the image boundaries.
  • the approximate direction of an edge can further be calculated by assuming the angle of the edge is the inverse tangent of ⁇ y/ ⁇ x:
  • the change in value in the x direction or the change in value in the y direction are calculated to obtain an approximate angle for that edge.
  • edges will have higher pixel intensity values than those surrounding it. So once a threshold is set, you can compare the gradient value to the threshold value and detect an edge whenever the threshold is exceeded. Furthermore, when the first derivative is at a maximum, the second derivative is zero. As a result, another alternative to finding the location of an edge is to locate the zeros in the second derivative. This method is known as the Laplacian.
  • the 5x5 Laplacian used is a convoluted mask to approximate the second derivative, unlike the Sobel method which approximates the gradient.
  • Laplace uses one 5x5 mask for the second derivative in both the x and y directions.
  • the Laplace mask 700 is shown in FIG. 7. At this point in the processing, the edges and directions (angles) of the edges have been calculated for all of the objects in the image or frame.
  • next step is to identify the ends of the lines.
  • the following equation can be used to find the ends of the line:
  • Line density ((number of potential pixels + 5F— 2E)/number of potential pixels)
  • F number of full pixels
  • E number of empty pixels.
  • the multipliers 5 and 2 are provided by way of example, and can be varied depending on the desired sensitivity. Every pixel along a line is considered a potential pixel, and so each step along the line increases the potential pixel count by one. Every time a pixel is filled in, the count of F is increase by one. When a potential pixel is not filled in, the count of E is increased by one.
  • the gap is filled in.
  • the result is a solid line.
  • the foregoing sets up the image for subsequent processing.
  • the data heretofore generated can be stored in one or more frame buffers.
  • results of the subsequent processing steps can also be stored in one or more frame buffers.
  • the sequence continues by identifying shadows in the image.
  • a preferred method for identifying shadows analyzes how the H, S and V values change across the image using the HSV data previously stored during set up. Thus, the procedure is not just looking for an area that is darker, but rather compares the properties in a given pixel to the properties in surrounding pixels.
  • Two histograms are created.
  • One histogram is generated by analyzing H/S values for each pixel.
  • a representative H/S histogram 800 is shown in FIG. 8. Supposing there are 256 possible values of H/S, the resulting H/S values are tabulated, e.g., in bins from 0-255, the value in each bin increasing each time an H/S value matching that bin is detected. The tabulated values are then plotted in a histogram.
  • the second histogram is generated by analyzing S/V values for each pixel.
  • a representative S/V histogram 900 is shown in FIG. 9. Again, the number of occurrences of a particular value can be tabulated in bins.
  • the histograms tend to have a feature in the plot line such as a peak or valley corresponding to the shadow edges. How these tendencies are taken advantage of is presented next. The scaling from 0 to 255 is random... I chose it just b/c I have been doing that for other histograms.
  • shadow detection continues by comparing H/S for each pixel with the H/S histogram 800.
  • the inventor has found that the histogram will typically have a line feature 802 such as a valley defined by two peaks. Any pixels having an H/S value to the left of the valley are flagged as potentially not in a shadow. Any pixels having an H/S value to the right of the valley are flagged as potentially being in a shadow. Pixels on the line feature, e.g., in the valley, may be marked as potentially being in shadow or potentially not being in shadow, depending on a default setting or user preference.
  • the line feature is typically around 1.0 on a 0.0-10.0 point scale (or about 25-26 on a 0-255 scale).
  • the line feature was found to typically lie just above 1.0 on a 10.0 point scale.
  • the S/V for each pixel is also compared with the S/V histogram 900.
  • the inventor has found that the S/V histogram, will typically have a line feature 902 such as apeak somewhere around 1 on a 10-point scale (scaled from 0-255). Any pixels having an S/V value to the left of the peak are flagged as potentially not in a shadow. Any pixels having an S/V value to the right of the peak are flagged as potentially being in a shadow. Pixels on the line feature, e.g., on the peak, may be marked as potentially being in shadow or potentially not being in shadow, depending on a default setting or user preference.
  • the line feature is typically around 1 on a 0.0 to 10 point scale (or about 25-26 on a 0-255 scale). In several image processing experiments, the line feature was found to typically lie just above 1.0 on a 10.0 point scale.
  • the threshold value for determining whether to flag the pixel as in shadow or not can be set at a default value, e.g., 1.0. Further, a user may be allowed to modify the flagging thresholds of the histograms in order to further define what is considered to be in shadow or not.
  • flagging a pixel as potentially in shadow or not is not to be limited to actual marking of the data, but also includes flagging by not marking data, e.g., marking pixels in shadow and not marking pixels not in shadow, thereby essentially flagging the pixels as being in shadow or not based on whether the pixel is marked or not.
  • P and Q as determined from the histograms, if H/S >P and if S/V >Q for a particular pixel, then the pixel is considered to be in a shadow.
  • H is important because the bluer the pixel H is over S, the more likely it is to be in a shadow. Accordingly, the position and area of a shadow in an image can be estimated. This gives a very good approximation of where the shadows are located.
  • FIG. 10 An illustrative 3x3 mask 1000 is shown in FIG. 10.
  • H runs from 0 to 360 degrees on a color wheel.
  • a shadow will typically not be a constituent shade, due in part to the texture of the shadowed image, and in part to the inherent variations in image quality due to such things as equipment (camera) flaws, film variations, and compression effects. These effects may cause errors in the estimation of the shadows on the image.
  • FIG. 12A illustrates how a shadow 1200 may have a jagged edge, for example. Taking advantage of the fact that the edges 1202 in the image have already been detected, the image is processed both horizontally and vertically to match the shadow 1200 to the nearest edge 1202. If the shadow 1200 is within a certain proximity to the edge 1202
  • a shadow boundary adjustment process includes scanning the image line by line both horizontally and vertically. When an edge is detected (as previously found during edge detection), a number of pixels between the edge and pixels in shadow is determined. If that number is within a predetermined number of pixels, e.g., 2-10 pixels, the shadow boundary is extended back to the edge.
  • the pixels being analyzed are in shadow
  • the number of pixels to search beyond the edge or shadow boundary can be based on one or many things. Several examples follow. The number of pixels can be based on the histograms. This is because the shadow pixels along the edges may have fallen on the wrong side of the line features in the histograms.
  • the number of pixels to search beyond the edge or shadow end can also be an arbitrary number defined by the user, can be preset, can be based on a percentage of total pixels in the image, etc.
  • the number of pixels to search can also be based on a percentage, e.g., 1%, of the width or height.
  • the edges of the shadow are preferably softened as part of the extending of the shadow to the edges as described above.
  • One way to do this is to mark how much a pixel is in shadow along some scale, e.g., 0-255, where 0 is out of shadow, and 255 is completely in shadow. Assume an edge now abuts a shadow to the right of the edge, as shown in the pixel array 1300 of FIG. 13. Before processing, pixels to the left of the edge have a value of 0, while pixels to the right of the edge pixel have a value of 255. To soften the edges of the shadow, the pixel 1302 to the left of the edge pixel 1304 is given a value of 0, and the properties of the pixels to the right of the edge pixel 1304 are adjusted to a percentage of full shadow properties.
  • some scale e.g., 0-255, where 0 is out of shadow, and 255 is completely in shadow.
  • the rate of shadow softening can be based on a linear interpolation, as shown. It is also desirable to soften the shadow edges where the shadow pixels already meet the pre-identified edges, in order to avoid a sharp change in contrast from shadow to non-shadow pixels. To achieve this, every pixel that is supposed to be in a shadow, and its neighbors, are analyzed to determine how many of its neighbors are also in the shadow. This can be performed using, e.g., a 3x3 mask or 5x5 mask. As shown in FIG.
  • the center pixel is given a shadow adjustment value, e.g., of 6/9, which lightens the center pixel by 66.66%. If all pixels in the mask are in a shadow, the center pixel is given a shadow adjustment of 9/9, i.e., no lightening. Note that this may eliminate the aforementioned softening element in a combined shadow extension/softening process. At this point, all of the shadows have been identified.
  • the process next identifies the objects in the image. Note that object identification can be performed earlier, but finding the shadows first results in some efficiency benefits, as will soon become apparent.
  • the H, S, and V images can be smoothed. The inventor has found through experimentation that the following processes work better after smoothing.
  • One method for identifying objects is to execute a flood fill algorithm of a type known in the art that groups all areas within a certain color range of each other. Each grouped area can then be considered an object.
  • a more accurate way to use a flood filling algorithm is to use the average differences in H, S, V, R, G and B to determine how much areas of the image vary in color and H, S and V.
  • a mask e.g., 3x3 mask
  • a flood fill algorithm is executed to find areas of similar pixels outside the shadows. This allows the program to compensate for such things as faded or bright areas on an object, etc.
  • An illustrative flood fill algorithm looks at a pixel and compares its H, S, V, R, G, and/or B values to those of its neighboring pixels. If the values are within a prespecified range, the pixel is marked as being in a group. The sequence is repeated for the next pixel.
  • An illustrative flood fill algorithm is:
  • is the H, S, V, R, G or B value and K is a constant.
  • a ⁇ refers to the change in H, S, V, R, G or B value between the pixel of interest and one of the pixels adjacent to it.
  • a vg A ⁇ refers to the average change in H, S, V, R, G or B value between the pixel of interest and remaining pixels in the image.
  • a computation for some or all of H, S, V, R, G and B can be performed for each pixel.
  • S and R, G, B are preferably given more weight than H and V, because a change in S and R, G, B likely indicates a transition from one object to another.
  • a further way to use a flood filling algorithm is to perform the foregoing, but use the leveling data described previously.
  • a preferred method for identifying objects in the image is to use a histogram. In general, the image is analyzed to determine the change in R, G, B, H, S, and/or V between adjacent pixels, and a histogram is created of the changes. Peaks appear in the histogram. These peaks are then used identify edges between objects in the image.
  • FIG. 15 illustrates a portion of an image 1500 having pixels Al, A2, A3..., Bl, B2, B3..., and Cl, C2, C3.
  • the process begins by analyzing the pixel Al in the bottom left corner of image and comparing the change of each of R, G, B, H 3 S, and/or V for each pixel A2, Bl 3 B2 adjacent to Al.
  • the comparison process continues, moving across the image and calculating the change of R, G, B, H, S, and/or V for each adjacent pixel not previously analyzed relative to the current pixel of interest.
  • the change in values between Al and A2 were calculated during analysis of Al and surrounding pixels and need not be calculated again.
  • R, G, B, H, S, and/or V can be, for example, 0 to 255. Each change in value is stored in bins ranging from 0 to 255.
  • An illustrative bin 1600 is shown in FIG. 16. Accordingly, if pixel Al has an R value of 200, and A2 has an R value of 50, the ⁇ R would be 150. The bin for a ⁇ R value of 150 would increase by one.
  • the bin is plotted to create a histogram.
  • An illustrative histogram 1700 is shown in FIG. 17. As shown, the number of instances of little or no change between adjacent pixels is typically large, while the instances of changes of R, G, B, H, S, and/or V typically progressively decrease as the changes become more dramatic.
  • peaks 1702 will appear where one object ends and another begins. Accordingly, adjacent pixels having a change in value in the range of the peak are considered to be along an edge. This is because an edge between two different objects will create the same color change between pixels found along that edge.
  • the adjacent pixels having a change in value between the range 1704 of the peak 1702 can be detected during a subsequent flood fill process, can be detected by scanning data saved during analysis of the image for creating the histogram, etc.
  • the process may have a cutoff value 1706 for the histogram, below which any peaks are not considered.
  • the cutoff value 1706 can be a value between 1 and 50.
  • the portion of the histogram 1700 below the cutoff value 1706 primarily reflects noise in the image. This process works with any type of image or scene.
  • Yet another way to identify at least some of the objects in the image frames of a sequence of frames, e.g., a movie, is to use motion-based detection.
  • motion-based detection changes in position of pixels relative to other pixels indicate that an object is moving. By noting which pixels move and which are stationary from frame to frame, the moving object can be identified.
  • Motion-based detection may also be used to verify and refine the objects detected using one of the other methods described previously. Now the objects in the image are identified. Missing pixels in the objects can be filled in, in a manner similar to the way the shadowed pixels are matched to the edges.
  • FIG. 18 illustrates a process 1800 for determining which objects are in the same overall object and removing shadows.
  • the image is analyzed for determining its properties.
  • objects in the image are identified based at least in part on the properties of the image.
  • a first object of the image that is at least primarily in shadow is selected.
  • a second object adjacent the first object is selected.
  • the two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object.
  • the ratio (Rratio) of the average R value of pixels in shadow to the average R value of the pixels not in shadow is less than 1.
  • the ratio (Gratio) of the average G value of pixels in shadow to the average G value of the pixels not in shadow is less than 1.
  • the ratio (Bratio) of the average B value of pixels in shadow to the average B value of the pixels not in shadow is less than 1.
  • the ratio of average S of the pixels in shadow to average S of the pixels not in shadow is much higher than 1, e.g., 1.5 or higher.
  • the ratio of average V of the pixels in shadow to average V of the pixels not in shadow is much less than 1, e.g., 0.5 or lower.
  • the ratio of Bratio to Gratio is greater than or equal to 1. This is because, when go into shadow, the blue does not drop off as quickly as the green.
  • the ratio of Gratio to Rratio is greater than or equal to 1. This is because, when go into shadow, the green does not drop off as quickly as the red.
  • the user can be presented with options to select how much greater or less than 1 some or all of the above-defined ratios much be, thereby determining how aggressively the program matches shadowed and nonshadowed objects.
  • the shadow can be removed from the shadowed object (portion of the overall object) in optional operation 1812 by changing the pixel properties based on the ratios. For example, multiply the R value of a pixel in shadow by 1 /Rratio to get a new R value (Rnew) that does not appear to be in shadow.
  • new green and blue values can be calculated for pixels in shadow by multiplying the G and B values by Gratio and Bratio, respectively.
  • the resulting overall object will appear to not have a shadow cast on it.
  • the RGB ratios can be based on RGB values of all pixels in each object, or subsets of the pixels.
  • the user can also lighten and darken shadows on an object-by-object basis or in the overall image by adjusting the ratios used to calculate the new RGB values for the shadowed pixels.
  • the user can also be allowed to manipulate the R 5 G, B, H, S and/or V values of the shadowed object portion of the overall object to further refine the image.
  • the user can also be presented with a screen that shows the identified objects, and the user can select from which objects to remove or alter shadows.
  • Many of the features described herein can be tuned and/or turned on and off by the user. For example, the shadow edge softening can be turned off if desired.
  • An image processing software product provides one or more of the following features: • Slide show and movie output
  • Shadow and highlight correction for improving the contrast of over- or underexposed areas of an image • Photo touch-up including removal of redeye, dust, scratches, blemishes, wrinkles, and other flaws
  • the image processing software may include menu options to find shadows, as well as allow the user to define how aggressively to find shadows.
  • Other options presented by the image processing software may include allowing the user to define properties of shadows such as darken shadows, lighten shadows, etc. e.g., by presenting a slider to darken or lighten shadows.
  • a field may also allow the user to set percentage of shadow shown, e.g., 50% lighter.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

Systems, methods, and computer program products that provide broadcast-quality photo-realistic rendering of one or more 3D objects added to a still frame (photo mode), or series of frames (movie mode). The basic steps performed during creation of an image include analysis of a 2D image frame during a set up sequence, adding a 3D object to the frame, and adding visual effects to the frame with little or no user intervention, the visual effects including but not limited to shadowing, reflection, refraction, and transparency. Also, systems, methods, and computer program products that identify and allow manipulation of shadows in a still frame (photo mode), or series of frames (movie mode). Manipulation of shadows includes such things as adjustment of shadow boundaries, softening of shadow boundaries, adjustment of shadow properties, and even removal of shadows altogether.

Description

AUTOMATIC COMPOSITING OF 3D OBJECTS IN A STILL FRAME
OR SERIES OF FRAMES AND DETECTION AND MANIPULATION OF
SHADOWS IN AN IMAGE OR SERIES OF IMAGES
FIELD OF THE INVENTION
The present invention relates to image rendering, and more particularly, this invention relates to automated, accurate compositing of three dimensional (3D) objects added to a two dimensional (2D) frame or series of 2D frames. The present invention also relates to image processing, and more particularly, this invention relates to identification and manipulation of shadows in a still image or series of images.
BACKGROUND OF THE INVENTION
Photography and moviemaking have become staples in modern society. A photograph is typically a single image frame of a real scene. Movies can be described as a series of frames that together form what appears to the human eye to be a continuously moving image. Both photographs and movies are now found in both physical and digital formats. In recent years, advances in technology have allowed creation of entirely three dimensional worlds. 3D graphic systems are able to produce an image on a two- dimensional screen of a display in such a manner that the image simulates three- dimensional effects. In such 3D systems, the surface of a 3D object to be represented is separated into a plurality of polygonal surfaces having various arbitrary shapes. Picture data representing the polygonal areas of the 3D object are successively stored in a frame memory having memory locations corresponding to positions on a display screen to accumulate picture data which, when supplied to the display, reconstruct an image which appears to be three-dimensional. In such 3D systems the data representing each of the polygonal surfaces must be transformed in order to represent three-dimensional effects such as rotation of the object they represent. In 3D systems the image data for the various polygonal surfaces are produced in succession based on data indicating the depth of each polygonal surface from the plane of the display screen. Conventional 3D systems produce image data representing the polygonal surfaces of an object such that surfaces which cannot be seen from the point of view when displayed are produced first and stored in a display memory and the remaining data representing the polygonal surfaces are successively produced in order according to their depth from the screen. Consequently, image data representing a polygonal surface at the front of the object cover over the image data of reverse surfaces which previously were produced and stored. It is necessary, therefore, to include data indicating the depth of each polygonal surface (referred to as "Z data") and the order in which the data representing the polygons are produced is determined by reference to such Z data. In the conventional 3D systems a Z buffer is provided to store the Z data in pixel units and the stored Z data are compared to determine a display preference.
In conventional 3D systems, an effect of the environment on 3D objects is relatively easy to compute, as the environment also has Z data assigned to it, in addition to X (horizontal) and Y (vertical) data. However, where the environment does not have Z data associated with it, and is thus by definition a 2D environment, effects of the 2D environment on 3D objects such as reflection, refraction, shadows, etc. have heretofore not been readily and accurately rendered.
It has also been proposed in the prior art to implement a system wherein 2D image data would be produced by means of a conventional 2D system and three- dimensional image data would be produced by means of a conventional 3D system independently of the 2D system. The 2D image data and the 3D image data which have been produced independently are then added upon conversion into a video signal to be supplied to a video display device. However, this system too fails to allow accurate rendering of environmental effects on a 3D object.
Additionally, both methods simply overlay the 3D object over the 2D background image. Consider, for example, movies which add 3D objects to a background image of a real scene. Current methods render the 3D object in a 3D renderer, and composite the 3D image on the 2D frames of a film. Then, artists must go back and, frame by frame, manually draw shadows and reflections on the 3D object. This is a very time consuming and thus expensive job, considering that a typical movie runs at about 30 frames per second.
If the 3D object is supposed to be positioned behind something on the frame, present systems require that a user manually create an image mask that is exactly the same size and shape as the 2D object to be shown in front of the 3D object. Another mask, a shadow mask, is created by hand for the shadowing created by or cast onto the 3D object. Shadowing is currently performed by dimming the image, which is not an accurate representation of a shadow. Rather, the dimming appears more like a fuzzy area rather than an accurate representation of how the shadow will be cast. The typical method is to manually hand-draw a shadow mask for each frame by using ADOBE® PHOTOSHOP® or other manual graphics program.
If the designer further wants reflections in the scene, artists are called upon to make reflection maps. These are texture maps that go on the 3D model. This requires the artists to estimate what the scene around the 3D object looks like, and map this onto the 3D model. One problem is that the reflections do not look realistic, particularly on rounded or angled surfaces. Heretofore methods have not been able to accurately create the natural deformation of the reflection due to curvature of the reflecting surface. Particularly, human artists find it very difficult to conceptualize this deformation and create it in the reflection in the composite image.
Again, the state of the art is to manually perform all of these functions frame by frame, as the 2D image data does not have Z data assigned to it.
Additionally, if a surface of the 3D object is partially transparent, artists merely shade the pixels of the 2D image that would be viewable through the transparent portion of the 3D object. However, most transparent surfaces are refractive. Current methods do not account for refractive distortion. What is therefore needed is a way to automatically perform not only rendering of a 3D image in a 2D scene, but also to add realistic shadowing, reflection, refraction, transparency and other effects automatically. This would save an immense amount of man-hours when generating animations, as the role of artists could then be greatly reduced and even eliminated. Such a solution would also reduce the inherent flaws in the effects heretofore manually created by human artists.
Also, image processing software is currently used by millions of people worldwide to perform a variety of image processing functions, perhaps the most typical of which is manually changing colors of pixels to make the image appear more realistic. Examples include removal of "red-eye" from human subjects, and touching up of edges.
However, image processing software has heretofore been unable to accurately identify shadows in an image. Nor has it been able to accurately tie shadowed portions of an object to nonshadowed portions of the same object. As such, the ability to work with shadows in image processing software is limited and inaccurate at best. What is therefore needed is a way to automatically and accurately identify shadows in an image, and tie shadowed portions of objects to nonshadowed portions. This would open the door for improved photo processing capabilities, and allow automation of features that now require the manual user input. Such a solution would also reduce the inherent flaws in the effects heretofore manually created by human users. It would also be desirable to accurately remove, lighten, or darken shadows from images. Prior methods of removing or lightening shadows merely lighten the entire image, giving the image a "washed out" look. Similarly, prior methods of darkening shadows darken the entire image. These methods fail to maintain the integrity of the nonshadowed portion of the image. Accordingly, the user is often required to revert to manual tools to adjust any shadowing.
SUMMARY OF THE INVENTION
Systems, methods, and computer program products that provide broadcast-quality photo-realistic rendering of one or more 3D objects added to a still frame (photo mode), or series of frames (movie mode).
The basic steps performed during creation of an image include analysis of a 2D image frame during a set up sequence, adding a 3D object is added to the frame, and adding visual effects to the frame with little or no user intervention, the visual effects including but not limited to shadowing, reflection, refraction, and transparency.
In one embodiment, a 2D image is analyzed for determining several of its properties. These properties may include one or more of hue (H), saturation (S), brightness value (V), and red, green and blue (RGB) color intensity. HSVRGB values for each pixel are stored. The image is smoothed and edges in the frame are detected. Lines are refined based on the detected edges. The shadows in the 2D image are detected based on analysis of HSV of interesting pixel and surrounding pixels. Shadows in the frame are matched with the edges to refine and further define the shadows in the frame. Objects in the frame are identified such as by using a flood fill algorithm to find areas outside shadows having similar HSVRG or B values. This completes the setup of the 2D image frame in this embodiment. One or more 3D objects are added to the frame. Occulting is performed to provide the effect of depth. In other words, the objects of the 2D image are analyzed to intelligently estimate which objects in the frame are in front of or behind the 3D object based on the position of the object in the frame relative to the 3D object. Since the shape of each 3D object in every direction is known, ray tracing can be used to intelligently add visual effects to the composite image. For example, reflections of one or more of the objects can be rendered on the 3D object, taking into account any curvature of the 3D object. The shadows found are adjusted to be properly cast onto the surface of the 3D object. A representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object may be rendered on the outer surface of the 3D image. This may include effective rendering on the outer surface of the 3D image by merely shading a portion of the background image that would be viewable through the transparent portion of the 3D object. Any portion of the 2D image viewable through a transparent or semi-transparent portion of the 3D object may be adjusted to reflect the effect of refraction caused by the transparent portion of the 3D object. Other effects can also be added. For example, atmospheric effects can be estimated from analysis of the 2D image and added to the 3D object.
Also disclosed are systems, methods, and computer program products that identify and allow manipulation of shadows in a still frame (photo mode), or series of frames (movie mode). Manipulation of shadows includes such things as adjustment of shadow boundaries, softening of shadow boundaries, adjustment of shadow properties, and even removal of shadows altogether.
A method for identifying pixels in shadow in an image according to one embodiment of the present invention includes analyzing an image for determining several properties thereof, the properties including hue (H), saturation (S), brightness (V), red color content (R), green color content (G), and blue color content (B). A first histogram of H/S values calculated for each pixel in the image is calculated. A second histogram of S/V values calculated for each pixel in the image is also calculated. A line feature in the first histogram is identified. Similarly, a line feature in the second histogram is identified. Pixels having an H/S value above the line feature in the first histogram and an S/V value above the line feature in the second histogram are marked as being in shadow.
A method for manipulating shadows of an image according to one embodiment of the present invention includes analyzing an image for determining several properties thereof. A determination is made as to which pixels of the image are in shadow. A shadow boundary adjustment and/or shadow boundary softening of the image is performed for a group of pixels determined to be in shadow, an outer periphery of the pixels determined to be in shadow defining the shadow boundary.
A method for identifying an object in an image according to an embodiment of the present invention includes analyzing an image for determining several properties thereof. Objects in the image are found based at least in part on the properties of the image. A first object determined to be at least primarily in shadow is selected. A second object adjacent the first object is also selected. The two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object.
A method for removing shadows from an image according to one embodiment includes analyzing an image for determining several properties thereof. Objects in the image are found based at least in part on the properties of the image. A first object determined to be at least primarily in shadow is selected. A second object adjacent the first object is also selected, where the second object is not primarily in shadow. The two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object. New red (R) values are calculated for pixels in the first object based on a ratio of an average R value of the pixels in the first object to an average R value of pixels in the second object. New green (G) values are calculated for pixels in the first object based on a ratio of an average G value of the pixels in the first object to an average G value of pixels in the second object. New blue (B) values are calculated for pixels in the first object based on a ratio of an average B value of the pixels in. the first object to an average B value of pixels in the second object.
Additional embodiments and features of the present invention are presented below.
The invention can be implemented entirely in hardware, entirely in software, or a combination of the two. The invention can also be provided in the form of a computer program product comprising a computer readable medium having computer code thereon.
Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrate by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
For a fuller understanding of the nature and advantages of the present invention, as well as the preferred mode of use, reference should be made to the following detailed description read in conjunction with the accompanying drawings.
FIG. 1 illustrates a hardware system useable in the context of the present invention.
FIG. 2 A is a flow diagram of a high level process of an embodiment of the present invention.
FIG. 2B is a flow diagram of a high level process of another embodiment of the present invention.
FIG. 3 A is flow diagram of a process according to one embodiment of the present invention. FIG. 3B is flow diagram of a process according to one embodiment of the present invention.
FIG. 4 illustrates a mask for neighborhood averaging, used during a smoothing process according to one embodiment of the present invention.
FIGS. 5A-B depict Sobel masks, used during an edge detection process according to one embodiment of the present invention.
FIG. 6 illustrates processing using a Sobel mask.
FIG. 7 depicts a Laplace mask, used during an edge detection process according to one embodiment of the present invention.
FIG. 8 illustrates an H/S histogram. FIG. 9 illustrates an S/V histogram.
FIG. 10 illustrates a mask used during shadow detection according to one embodiment of the present invention.
FIG. 11 depicts a color wheel.
FIG. 12A illustrates an image generated by a shadow detection process. FIG. 12B illustrates an image having enhanced shadow definition after a shadow detection process.
FIG. 13 depicts a pixel array after shadow edge softening.
FIG. 14 depicts a mask used during a shadow softening process. FIG. IS illustrates a portion of an image during a histogram-based object recognition process.
FIG. 16 depicts a representation of bins used during a histogram-based object recognition process.
FIG. 17 illustrates a representative histogram used during a histogram-based object recognition process.
FIG. 18 is flow diagram of a process for determining which objects are in the same overall object and removing shadows.
BEST MODE FOR CARRYING OUT THE INVENTION
The following description is the best embodiment presently contemplated for carrying out the present invention. This description is made for the purpose of illustrating the general principles of the present invention and is not meant to limit the inventive concepts claimed herein. Further, particular features described herein can be used in combination with other described features in each of the various possible combinations and permutations. The following specification describes systems, methods, and computer program products that provide broadcast-quality photo-realistic rendering of one or more 3D objects added to a still frame (photo mode), or series of frames (movie mode). Effects such as reflection, shadows, transparency, and refraction on or of the 3D object(s) relative to objects in the frame are automatically determined and added to the composite image. The software also dynamically determines what object in the frame should be in front of or behind the 3D object(s) and places those object in front of or behind the 3D object(s), as well as creates realistic reflections of the frame objects on the 3D object(s).
The following specification also describes systems, methods, and computer program products that identify and allow manipulation of shadows in a still frame (photo mode), or series of frames (movie mode). Manipulation of shadows includes such things as adjustment of shadow boundaries, softening of shadow boundaries, adjustment of shadow properties, and even removal of shadows altogether.
The invention can be implemented entirely in hardware, entirely in software, or a combination of the two. The invention can also be provided in the form of a computer program product comprising a computer readable medium having computer code thereon. A computer readable medium can include any medium capable of storing computer code thereon for use by a computer, including optical media such as read only and writeable CD and DVD, magnetic memory, semiconductor memory (e.g., FLASH memory and other portable memory cards, etc.), etc. A computer for storing and/or executing the code and/or performing the processes described herein can be any type of computing device, including a personal computer (PC), laptop computer, handheld device (e.g., personal digital assistant (PDA)), portable telephone, etc. FIG. 1 illustrates a computer 100 according to one embodiment. As shown, the computer 100 includes a system bus 102 to which a processor 104 is coupled. The processor 104 executes instructions found in the code, and controls several of the other components of the computer 100. Memory including Random Access Memory (RAM) 106 and nonvolatile memory 108 (e.g., hard disk drive) store the code or portions thereof, as well as data, during performance of the processes set forth herein. A graphics rendering subsystem 110 may also be present, and can include a separate graphics processor and additional memory.
Various In/Out (I/O) devices are also present. User input devices such as a keyboard 112 and mouse 114 allow a user to provide user instructions to the computer 100. A monitor 116 or other display device outputs graphical information to the user. If a graphics subsystem 110 is present (as shown), the display device can be coupled to the graphics subsystem 110 instead of directly to the bus 102. A network interface 118 may also be provided to allow the computer 100 to connect to remote computing devices for a variety of purposes including data upload, data download, etc. A media reader 120 such as a DVD player or FLASH memory port may be present for reading code from a computer readable medium 122.
The following description is applicable to creation of both still frame images as well as a series of frames, as in a movie. For simplicity, much of the following description shall refer to the functions performed on a single frame and single 3D object, it being understood that the procedures set forth herein can be sequentially applied to multiple frames, e.g., of a movie and for multiple 3D objects per frame.
To aid the reader in understanding the overall aspects of the invention, high level processes will first be described, followed by a detailed description of each operation. Note that while the order of the steps performed is generally preferred, the order is not critical and the software can perform some operations prior to others, some in parallel, etc. Further, not all operations presented are required by the invention, but rather are optional.
FIG. 2 A depicts the high level process 200 performed during creation of an image. In operation 202, a 2D image is analyzed during a set up sequence. In operation 204, 3D object data is added to the scene. In operation 206, visual effects including but not limited to shadowing, reflection, refraction, and transparency are calculated for the 3D object and/or 2D image with little or no user intervention. In operation 208, the 3D image is then rendered into the frame with the calculated effects, along with any effects created by the 3D object on the 2D image, thereby creating a realistic, accurate, high- quality composite image.
FIG. 3 A illustrates the general process 300 performed by a preferred embodiment of the present invention. In operation 302, the 2D image is analyzed for determining several of its properties. These properties include hue (H), saturation (S), brightness value (V), and red, green and blue (R, G, B) color intensity. H, S, V, R5 G, and B values for each pixel are stored in a frame buffer or buffers. In operation 304, the image is smoothed. In operation 306, edges in the 2D image are detected. In operation 308, lines are refined based on the detected edges. In operation 310, the shadows in the 2D image are detected, e.g., based on analysis of HSV of interesting pixel and surrounding pixels. In operation 312, shadows in the frame are matched with the edges found in operation 306 to refine and further define the shadows in the frame. In operation 314, objects on the frame are found, e.g., using a flood fill algorithm to find areas outside shadows having similar HSVRGB values. This completes the setup of the 2D image frame.
In operation 316, data relating to one or more 3D objects are added to the frame. In operation 318, the objects identified in operation 314 are analyzed to estimate which objects in the frame are in front of the 3D object based on the position of the object in the frame relative to the 3D object.
Since the shape of each 3D object in every direction is known, ray tracing can be used to intelligently add visual effects to the composite image. In operation 320, reflections of one or more of the objects found in operation 314 are rendered on the 3D object, taking into account any curvature of the 3D object. In operation 322, the shadows found in operations 310 and 312 are adjusted to be properly cast onto the surface of the 3D object. In operation 324, a representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object is rendered on the outer surface of the 3D image. Note that operation 324 includes effective rendering on the outer surface of the 3D image by merely shading a portion of the background image that would be viewable through the transparent portion of the 3D object. In operation 326, any portion of the 2D image viewable through a transparent or semi-transparent portion of the 3D object is adjusted to reflect the effect of refraction caused by the transparent portion of the 3D object. FIG. 2B depicts a high level process 250 performed during image processing according to one embodiment. In operation 252, an image is analyzed during a set up sequence. In operation 254, shadows in the image are identified. In operation 256, the image is processed for refining and/or manipulating the shadows. In operation 258, the image is then rendered with the manipulated shadows. FIG. 3 illustrates the general process 350 performed by a preferred embodiment of the present invention. In operation 352, the image is analyzed for determining several of its properties. These properties include hue (H), saturation (S), brightness value (V), and red, green and blue (R, G, B) color intensity. H5 S, V, R, G, and B values for each pixel are stored in a frame buffer or buffers. In operation 354, the image is smoothed and leveled. In operation 356, edges in the image are detected. In operation 358, lines are refined based on the detected edges. In operation 360, the shadows in the image are detected, e.g., based on analysis of HSV of interesting pixel and surrounding pixels. In operation 362, shadows in the image are matched with the edges found in operation 356 to refine and further define the shadow boundaries. In operation 364, shadow edges are softened. In operation 366, objects on the image are found, e.g., using a flood fill algorithm to find areas inside and outside shadows having similar HSVRGB values. In operation 368, objects in shadow and out of shadow are correlated to define overall objects. In operation 370, the shadows can be lightened, darkened, and/or removed. Other embodiments of the present invention may perform only a subset of the foregoing steps and/or additional steps. Further, the order in which the steps for these and other disclosed processes are presented by way of example only, and is in no way meant to require that the present invention perform the steps in the same order presented. Rather, the various steps (or portions thereof) can be performed in any order.
Set Up
As noted above, the first stage is preparing the image for rendering. An image is loaded into the host system. The image can be, for example, a high quality natural image captured with a digital camera. The image can also be a scanned from a picture, generated from film negatives, etc. The image can further be a purely computer- generated 2D or 3D image.
Several properties of the image (or interchangeably where appropriate, of the frame) are gathered. Red (R), Green (G) and Blue (B) properties are known for a digital image. The image is also analyzed to determine the Hue (H)5 Saturation (S) and Brightness Value (V) of each pixel in the image. Hue (H) refers to the relative color of the pixel on the red-green-blue color wheel. Saturation (S) is the degree of color depth. For example, a pure red is completely saturated while pure white is completely non- saturated. The Brightness Value (V) indicates the brightness level of the pixel.
The Saturation (S) may be determined as follows. For each pixel, determine which color (RGB) has the highest and lowest brightness (V and Vmin, respectively). RGB brightness values in one example vary from 0 to 255 (256 shades per R, G or B). The value V is set to the largest brightness value from among the three colors, i.e., the highest brightness of the three colors. For instance, a pixel in the sky portion of a landscape image would likely have blue as the brightest color. If so, V is set to the blue brightness value. Likewise, Vmin is set to the lowest brightness value. S can be determined by the following equation:
„ V -Vmin o =
If the pixel is black, S can be set to zero. The Hue (H) is also determined. If S=O, H can be set as undefined. If S≠O, the distance for each color can be determined by the following equation (here demonstrated as the red distance (Rdist)).
V- Vred
Rdist =
F - Fmin
where Vred is the brightness value of the red color for that pixel. Likewise the green distance (Gdist) and blue distance (Bdist) are also calculated. H is then calculated using one of the following equations. If V = Vred, then:
H = Bdist - Gdist .
If V = Vgreen, then:
H - 2 + Rdist - Bdist
If V = Vblue, then
H = 4 + Gdist - Rdist
The resulting values of the foregoing analyses are in (H,S,V) form, where H varies from 0.0 to 6.0, indicating the position along the color circle where the hue is located. S and V vary from 0.0 to 1.0, with 0.0 being the least amount and 1.0 being the greatest amount of saturation or value, respectively. Note that larger or smaller scales can be used, e.g., 0-1, 0-100, 0-360, etc. For example, if the color scale is 360 (as in a color circle), H can be calculated in angular coordinates using the following formula, which is equivalent to the immediately prior formula: H = 240 + (Gdist - RdisήxβO
The resulting values of this equation vary from 0.0 to 360.0, indicating the position along the color circle where the hue is located. As an angular coordinate, H can wrap around from 360 back to 0, so any value of H outside of the 0.0 to 360.0 range can be mapped onto that range by dividing H by 360.0 and finding the remainder (also known as modular arithmetic). Thus, -30 is equivalent to 330, and 480 is equivalent to 120, for example.
H, S, and V images of the image are created and stored in a frame buffer or buffers. The frame buffer may be a type of RAM5 nonvolatile memory such as a magnetic disk, etc. If there are several images, such as in a movie mode, each image of the series of images is analyzed, and HSV images are created and stored for each image. R5 G5 and B values for each pixel in the image are also stored in a frame buffer or buffers.
Now that the properties of the image are determined and stored, further processing using these properties is performed. Leveling data may be created and stored for use in subsequent processing such as object identification. An R, G, B histogram is obtained of the whole image. Suppose there are 1,000,000 pixels in the image, and 256 shades per color. The number of pixels in each shade are stored in bins representing each component from 0 to 255. For example, if 8,201 pixels have an R shade of 35, the bin for shade 35 will have a value of 8,201. Each bin would have about 1 ,000,000/256 = 3906 pixels if the colors were evenly distributed. Using this value, the old image is mapped to the new image to obtain about an equal distribution of color across the entire image, thereby varying the contrast as much as possible. During the mapping sequence, bin 0 is summed with the next bin or bins until the sum > 3906. Supposing bins 0-2 total 4000, then all pixels associated with bins 0-2 are give a first shade. Then bin 3 is added to the next bin or bins until the sum > 3906. Then all pixels associated with bins 3-n are give a second shade. Thus, the color contrast is maximized. This in turn is useful for such things as object recognition using flood fill algorithms, as discussed below.
Next, the objects in the image (or frame) are identified. To identify objects in the image, edge detection is performed to detect edges in the image (or frame), as edges tend to denote boundaries between or on objects. However, images usually have some sort of "noise" or edge distortion due to the inherent irregularities in real-life objects, as well as the limitations in the camera or scanner and as an effect of any compression. An example of image noise in a real-life object is found in concrete with a rock facade, where each rock has an edge and so appears to be its own object.
Thus, a smoothing process is preferably performed prior to detecting the edges. The purpose of noise smoothing is to reduce various spurious effects of a local nature in the image, caused perhaps by noise in the image acquisition system, or arising as a result of compression of the image, for example as is typically done automatically by consumer-grade digital cameras. The smoothing can be done either by considering the real space image, or its Fourier transform.
The simplest smoothing approach is neighborhood averaging, where each pixel is replaced by the average value of the pixels contained in some neighborhood about it.
The simplest case is probably to consider the 3x3 group of pixels centered on the given pixel, and to replace the central pixel value by the unweighted average of these nine pixels. For example, the central pixel in the mask 400 of FIG. 4 is replaced by the value 13 (the nearest integer to the average).
If any one of the pixels in the neighborhood has a faulty value due to noise, this fault will now be spread over nine pixels as the image is smoothed. This in turn tends to blur the image.
A more preferable approach is to use a median filter. A neighborhood around the pixel under consideration is used, but this time the pixel value is replaced by the median pixel value in the neighborhood. Thus, for a 3x3 neighborhood, the 9 pixel values are written in sorted order, and the central pixel is replaced by the fifth highest value. For example, again taking the data shown in FIG. 4, the central pixel is replaced by the value 12.
This approach has two advantages. First, occasional spurious high or low values are not averaged in, they are ignored. Second, the sharpness of edges is preserved.
Another smoothing method is Gaussian smoothing. Gaussian smoothing is performed by convolving an image with a Gaussian operator which is defined below. By using Gaussian smoothing in conjunction with the Laplacian operator, or another Gaussian operator, it is possible to detect edges.
The Gaussian outputs a "weighted average" of each pixel's neighborhood, with the average weighted more towards the value of the central pixels. The Gaussian distribution function in two variables, g(xy), is defined by:
Figure imgf000019_0001
Where σ is the standard deviation representing the width of the Gaussian distribution. The shape of the distribution and hence the amount of smoothing can be controlled by varying σ. In order to smooth an image βx,y), it is convolved with g(x*y) to produce a smoothed image s(x^\ i.e.,. s(xy) =fix,y)*g(xo>).
A further way to compute a Gaussian smoothing with a large standard deviation is to convolve an image several times with a smaller Gaussian. While this is computationally complex, it is practical if the processing is carried out using a hardware pipeline.
Having smoothed the image, e.g., with a Gaussian operator, the edges in the smoothed image can be detected. There are many ways to perform edge detection.
However, the majority of different methods may be grouped into two categories, gradient and Laplacian. The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image. The Laplacian method searches for zero crossings in the second derivative of the image to find edges.
One suitable gradient edge detection algorithm uses the Sobel method. The Sobel operator performs a 2-D spatial gradient measurement on an image. Typically it is used to find the approximate absolute gradient magnitude at each point in an input grayscale image. The Sobel edge detector uses a pair of 3x3 convolution masks, one estimating the gradient in the x-direction (columns) and the other estimating the gradient in the y- direction (rows). A convolution mask is usually much smaller than the actual image. As a result, the mask is slid over the image, manipulating a square of pixels at a time. Illustrative Sobel masks 500, 502 are shown in FIGS. 5 A and 5B, respectively. The magnitude of the gradient is then calculated using the formula:
IGI= V^2 + QF2
An approximate magnitude can be calculated using:
\G\ = \Gx\ + \Gy\
A pixel location is declared an edge location if the value of the gradient exceeds some threshold. As mentioned above, edges will have higher pixel intensity values than those surrounding it. So once a threshold is set, the gradient value can be compared to the threshold value and an edge detected whenever the threshold is exceeded. When using the Sobel method, the mask is positioned over an area of the input image, that pixel's value is changed, and then the mask is shifted one pixel to the right. This sequence continues to the right until it reaches the end of a row. The procedure then continues at the beginning of the next row. The example in FIG. 6 shows the mask 500 being slid over the top left portion of the input image 600 represented by the heavy outline 602. The formula below shows how a particular pixel in the output image 604 can be calculated. The center of the mask is placed over the pixel being manipulated in the image. It is important to notice that pixels in the first and last rows, as well as the first and last columns cannot be manipulated by a 3x3 mask. This is because when placing the center of the mask over a pixel in the first row (for example), the mask will be outside the image boundaries.
B22^(an *miO+ (cti2*mi2)+ (ai3*m!3)+ (a2i *m2i)+ (a22*m22j+ (a23*m23)+ ct3i*m3i)+ (a32*m32)+ (a33*m33) The Gx mask highlights the edges in the horizontal direction while the Gy mask highlights the edges in the vertical direction. After taking the magnitude of both, the resulting output detects edges in both directions.
The approximate direction of an edge can further be calculated by assuming the angle of the edge is the inverse tangent of Δy/Δx:
tan" -if ^y
Figure imgf000021_0001
So, for each mask (Gx or Gy), the change in value in the x direction or the change in value in the y direction are calculated to obtain an approximate angle for that edge. As mentioned before, edges will have higher pixel intensity values than those surrounding it. So once a threshold is set, you can compare the gradient value to the threshold value and detect an edge whenever the threshold is exceeded. Furthermore, when the first derivative is at a maximum, the second derivative is zero. As a result, another alternative to finding the location of an edge is to locate the zeros in the second derivative. This method is known as the Laplacian.
In one embedment of the present invention, the 5x5 Laplacian used is a convoluted mask to approximate the second derivative, unlike the Sobel method which approximates the gradient. And instead of two 3x3 Sobel masks, one for each of the x and y directions, Laplace uses one 5x5 mask for the second derivative in both the x and y directions. However, because these masks are approximating a second derivative measurement on the image, they are very sensitive to noise and are therefore less preferable to the Sobel method and thus presented here as an alternate method. The Laplace mask 700 is shown in FIG. 7. At this point in the processing, the edges and directions (angles) of the edges have been calculated for all of the objects in the image or frame. In an image, after smoothing and edge detection, several fringe lines are typically found along the real edge line. In order to make the lines appear clean and continuous, it is desirable to remove the fringe lines and fill in any discontinuities. Accordingly, the process continues by thinning the lines found during edge detection, and intelligently filling the lines. A hysteresis algorithm known in the art is run to thin each line along the angle calculated for the line. The algorithm removes fringe lines and thins the line. However, this may result in a thin, broken line. The breaks may indicate an end of the line and start of another, or may just be gaps created by the hysteresis algorithm or missing from the original image.
Accordingly, the next step is to identify the ends of the lines. The following equation can be used to find the ends of the line:
Line density = ((number of potential pixels + 5F— 2E)/number of potential pixels)
where F = number of full pixels, and E = number of empty pixels. The multipliers 5 and 2 are provided by way of example, and can be varied depending on the desired sensitivity. Every pixel along a line is considered a potential pixel, and so each step along the line increases the potential pixel count by one. Every time a pixel is filled in, the count of F is increase by one. When a potential pixel is not filled in, the count of E is increased by one.
When the line density drops below zero, the process stops and the system assumes it has found the end of the line. Now that the ends of the lines are known, a line thickening algorithm is used to run along the line and builds up the line density between the ends previously identified.
Every time there is a potential pixel in a gap, the gap is filled in. The result is a solid line. The foregoing sets up the image for subsequent processing. As mentioned above, the data heretofore generated can be stored in one or more frame buffers. Likewise, results of the subsequent processing steps can also be stored in one or more frame buffers.
The sequence continues by identifying shadows in the image. A preferred method for identifying shadows analyzes how the H, S and V values change across the image using the HSV data previously stored during set up. Thus, the procedure is not just looking for an area that is darker, but rather compares the properties in a given pixel to the properties in surrounding pixels.
Two histograms are created. One histogram is generated by analyzing H/S values for each pixel. A representative H/S histogram 800 is shown in FIG. 8. Supposing there are 256 possible values of H/S, the resulting H/S values are tabulated, e.g., in bins from 0-255, the value in each bin increasing each time an H/S value matching that bin is detected. The tabulated values are then plotted in a histogram. The second histogram is generated by analyzing S/V values for each pixel. A representative S/V histogram 900 is shown in FIG. 9. Again, the number of occurrences of a particular value can be tabulated in bins.
The H/S and S/V histograms are then used to identify shadows based on the tendency that a border between shadow and not shadow lies near H = S and S = V. That is, S/V = 1 and H/S = 1. Thus, the histograms tend to have a feature in the plot line such as a peak or valley corresponding to the shadow edges. How these tendencies are taken advantage of is presented next. The scaling from 0 to 255 is random... I chose it just b/c I have been doing that for other histograms. In the example I was showing you, I was maxing out the histogram at 10.0 (which I was scaling to 255), and therefore a value of 1.0 would equal around 25 or 26 on the histogram graph, and therefore I'd be looking for the valley or peak around there. Both peak and valley actually lie just above 1.0 (25 or 26). I take 1.0 as the minimum.
With continued reference to FIG. 8, shadow detection continues by comparing H/S for each pixel with the H/S histogram 800. The inventor has found that the histogram will typically have a line feature 802 such as a valley defined by two peaks. Any pixels having an H/S value to the left of the valley are flagged as potentially not in a shadow. Any pixels having an H/S value to the right of the valley are flagged as potentially being in a shadow. Pixels on the line feature, e.g., in the valley, may be marked as potentially being in shadow or potentially not being in shadow, depending on a default setting or user preference. The line feature is typically around 1.0 on a 0.0-10.0 point scale (or about 25-26 on a 0-255 scale). In several image processing experiments, the line feature was found to typically lie just above 1.0 on a 10.0 point scale. With reference to FIG. 9, the S/V for each pixel is also compared with the S/V histogram 900. The inventor has found that the S/V histogram, will typically have a line feature 902 such as apeak somewhere around 1 on a 10-point scale (scaled from 0-255). Any pixels having an S/V value to the left of the peak are flagged as potentially not in a shadow. Any pixels having an S/V value to the right of the peak are flagged as potentially being in a shadow. Pixels on the line feature, e.g., on the peak, may be marked as potentially being in shadow or potentially not being in shadow, depending on a default setting or user preference. The line feature is typically around 1 on a 0.0 to 10 point scale (or about 25-26 on a 0-255 scale). In several image processing experiments, the line feature was found to typically lie just above 1.0 on a 10.0 point scale.
If a particular pixel is flagged as potentially being in shadow by both histogram analyses, then the pixel is marked as being in a shadow. If a pixel is flagged as potentially being in a shadow and potentially not being in a shadow, the pixel can be considered to be in a shadow or not depending on the default or user-defined setting. For either histogram analysis, the threshold value for determining whether to flag the pixel as in shadow or not can be set at a default value, e.g., 1.0. Further, a user may be allowed to modify the flagging thresholds of the histograms in order to further define what is considered to be in shadow or not.
Also note that flagging a pixel as potentially in shadow or not is not to be limited to actual marking of the data, but also includes flagging by not marking data, e.g., marking pixels in shadow and not marking pixels not in shadow, thereby essentially flagging the pixels as being in shadow or not based on whether the pixel is marked or not. Thus, for values P and Q as determined from the histograms, if H/S >P and if S/V >Q for a particular pixel, then the pixel is considered to be in a shadow. These equations are based on the following observations. In a shadow, S goes up, while V goes down. If something is black, S may stay the same. The inventor has also found that shadows in images have a high blue saturation. In a shadow, which inherently is not receiving direct light from a primary light source, blue light tends to scatter more into the shadow relative to red and green light. Thus, H is important because the bluer the pixel H is over S, the more likely it is to be in a shadow. Accordingly, the position and area of a shadow in an image can be estimated. This gives a very good approximation of where the shadows are located.
The foregoing histogram analyses may also be performed with smoothing. A mask is run over the image to detect how H, S and V change in all directions (x and y). An illustrative 3x3 mask 1000 is shown in FIG. 10. H runs from 0 to 360 degrees on a color wheel. FIG. 11 depicts a color wheel 1100, Assume the angular coordinates are converted to values between 0 and 1, where 0 = 0° and 1 = 360°. Thus, 180° corresponds to 0.5, etc. 0 is red, 0.333 (120°) is green, and 0.666 (180°) is blue.
Using the mask, add up H, S and V values for all pixels in the mask (around and including the center pixel) to create the histograms. Then compare the H/S and S/V data to the histograms as above. For histogram feature values P and Q, if H/S >P, and if S/V
>Q, then the center pixel is considered to be in a shadow.
In any given image, a shadow will typically not be a constituent shade, due in part to the texture of the shadowed image, and in part to the inherent variations in image quality due to such things as equipment (camera) flaws, film variations, and compression effects. These effects may cause errors in the estimation of the shadows on the image.
FIG. 12A illustrates how a shadow 1200 may have a jagged edge, for example. Taking advantage of the fact that the edges 1202 in the image have already been detected, the image is processed both horizontally and vertically to match the shadow 1200 to the nearest edge 1202. If the shadow 1200 is within a certain proximity to the edge 1202
(number of pixels away), the shadow area is moved to the edge. Now the shadows appear close to what they should be, as shown in FIG. 12B.
A shadow boundary adjustment process according to one embodiment includes scanning the image line by line both horizontally and vertically. When an edge is detected (as previously found during edge detection), a number of pixels between the edge and pixels in shadow is determined. If that number is within a predetermined number of pixels, e.g., 2-10 pixels, the shadow boundary is extended back to the edge.
This may entail marking additional pixels as being in shadow.
Similarly, if the pixels being analyzed are in shadow, when the scan exits the shadow, a determination is made as to whether another edge is encountered in a predetermined number of pixels, e.g., 2-10 pixels. If so, the shadow boundary is extended out to the edge.
The number of pixels to search beyond the edge or shadow boundary can be based on one or many things. Several examples follow. The number of pixels can be based on the histograms. This is because the shadow pixels along the edges may have fallen on the wrong side of the line features in the histograms. The number of pixels to search beyond the edge or shadow end can also be an arbitrary number defined by the user, can be preset, can be based on a percentage of total pixels in the image, etc. The number of pixels to search can also be based on a percentage, e.g., 1%, of the width or height. The edges of the shadow are preferably softened as part of the extending of the shadow to the edges as described above. One way to do this is to mark how much a pixel is in shadow along some scale, e.g., 0-255, where 0 is out of shadow, and 255 is completely in shadow. Assume an edge now abuts a shadow to the right of the edge, as shown in the pixel array 1300 of FIG. 13. Before processing, pixels to the left of the edge have a value of 0, while pixels to the right of the edge pixel have a value of 255. To soften the edges of the shadow, the pixel 1302 to the left of the edge pixel 1304 is given a value of 0, and the properties of the pixels to the right of the edge pixel 1304 are adjusted to a percentage of full shadow properties. The rate of shadow softening can be based on a linear interpolation, as shown. It is also desirable to soften the shadow edges where the shadow pixels already meet the pre-identified edges, in order to avoid a sharp change in contrast from shadow to non-shadow pixels. To achieve this, every pixel that is supposed to be in a shadow, and its neighbors, are analyzed to determine how many of its neighbors are also in the shadow. This can be performed using, e.g., a 3x3 mask or 5x5 mask. As shown in FIG. 14, if six of the nine pixels in the mask 1400 are marked as being "in" a shadow, thus indicating that it is near an edge of the shadow, the center pixel is given a shadow adjustment value, e.g., of 6/9, which lightens the center pixel by 66.66%. If all pixels in the mask are in a shadow, the center pixel is given a shadow adjustment of 9/9, i.e., no lightening. Note that this may eliminate the aforementioned softening element in a combined shadow extension/softening process. At this point, all of the shadows have been identified.
The process next identifies the objects in the image. Note that object identification can be performed earlier, but finding the shadows first results in some efficiency benefits, as will soon become apparent. As an optional set up step, the H, S, and V images can be smoothed. The inventor has found through experimentation that the following processes work better after smoothing.
One method for identifying objects is to execute a flood fill algorithm of a type known in the art that groups all areas within a certain color range of each other. Each grouped area can then be considered an object.
A more accurate way to use a flood filling algorithm is to use the average differences in H, S, V, R, G and B to determine how much areas of the image vary in color and H, S and V. In this method, a mask (e.g., 3x3 mask) is run over all of the pixels to determine the average changes between adjacent pixels for all six categories (H, S, V, R, G, B). A flood fill algorithm is executed to find areas of similar pixels outside the shadows. This allows the program to compensate for such things as faded or bright areas on an object, etc.
An illustrative flood fill algorithm looks at a pixel and compares its H, S, V, R, G, and/or B values to those of its neighboring pixels. If the values are within a prespecified range, the pixel is marked as being in a group. The sequence is repeated for the next pixel. An illustrative flood fill algorithm is:
If Aξ/(avg Aξx K) < 1.0, then in range
where ξ is the H, S, V, R, G or B value and K is a constant. A ξ refers to the change in H, S, V, R, G or B value between the pixel of interest and one of the pixels adjacent to it. A vg Aξ refers to the average change in H, S, V, R, G or B value between the pixel of interest and remaining pixels in the image. A computation for some or all of H, S, V, R, G and B can be performed for each pixel. S and R, G, B are preferably given more weight than H and V, because a change in S and R, G, B likely indicates a transition from one object to another.
A further way to use a flood filling algorithm is to perform the foregoing, but use the leveling data described previously. A preferred method for identifying objects in the image is to use a histogram. In general, the image is analyzed to determine the change in R, G, B, H, S, and/or V between adjacent pixels, and a histogram is created of the changes. Peaks appear in the histogram. These peaks are then used identify edges between objects in the image.
FIG. 15 illustrates a portion of an image 1500 having pixels Al, A2, A3..., Bl, B2, B3..., and Cl, C2, C3.... The process begins by analyzing the pixel Al in the bottom left corner of image and comparing the change of each of R, G, B, H3 S, and/or V for each pixel A2, Bl3 B2 adjacent to Al. The comparison process continues, moving across the image and calculating the change of R, G, B, H, S, and/or V for each adjacent pixel not previously analyzed relative to the current pixel of interest. In other words, when analyzing pixel A2 and surrounding pixels, the change in values between Al and A2 were calculated during analysis of Al and surrounding pixels and need not be calculated again.
The values for R, G, B, H, S, and/or V can be, for example, 0 to 255. Each change in value is stored in bins ranging from 0 to 255. An illustrative bin 1600 is shown in FIG. 16. Accordingly, if pixel Al has an R value of 200, and A2 has an R value of 50, the ΔR would be 150. The bin for a ΔR value of 150 would increase by one.
Once the image is analyzed, the bin is plotted to create a histogram. An illustrative histogram 1700 is shown in FIG. 17. As shown, the number of instances of little or no change between adjacent pixels is typically large, while the instances of changes of R, G, B, H, S, and/or V typically progressively decrease as the changes become more dramatic.
The inventor has found that peaks 1702 will appear where one object ends and another begins. Accordingly, adjacent pixels having a change in value in the range of the peak are considered to be along an edge. This is because an edge between two different objects will create the same color change between pixels found along that edge. The adjacent pixels having a change in value between the range 1704 of the peak 1702 can be detected during a subsequent flood fill process, can be detected by scanning data saved during analysis of the image for creating the histogram, etc.
The process may have a cutoff value 1706 for the histogram, below which any peaks are not considered. For example, the cutoff value 1706 can be a value between 1 and 50. Typically, the portion of the histogram 1700 below the cutoff value 1706 primarily reflects noise in the image. This process works with any type of image or scene.
Yet another way to identify at least some of the objects in the image frames of a sequence of frames, e.g., a movie, is to use motion-based detection. In this process, changes in position of pixels relative to other pixels indicate that an object is moving. By noting which pixels move and which are stationary from frame to frame, the moving object can be identified. Motion-based detection may also be used to verify and refine the objects detected using one of the other methods described previously. Now the objects in the image are identified. Missing pixels in the objects can be filled in, in a manner similar to the way the shadowed pixels are matched to the edges.
The present invention also provides the capability to accurately determine which pixels are in the same overall object, whether they are in shadow or not, as well as remove shadows from images. FIG. 18 illustrates a process 1800 for determining which objects are in the same overall object and removing shadows. In operation 1802, the image is analyzed for determining its properties. In operation 1804, objects in the image are identified based at least in part on the properties of the image. In operation 1806, a first object of the image that is at least primarily in shadow is selected. In operation 1808, a second object adjacent the first object is selected. In operation 1810, the two objects are analyzed using the properties of the image for determining whether the two objects are part of a same overall object.
Thus, if two objects border each other, the pixels are adjacent, and one or more (preferably all) of the following are true, then the two objects are considered to be in the same overall object. • The ratio (Rratio) of the average R value of pixels in shadow to the average R value of the pixels not in shadow is less than 1.
• The ratio (Gratio) of the average G value of pixels in shadow to the average G value of the pixels not in shadow is less than 1. • The ratio (Bratio) of the average B value of pixels in shadow to the average B value of the pixels not in shadow is less than 1.
• The ratio of average S of the pixels in shadow to average S of the pixels not in shadow is much higher than 1, e.g., 1.5 or higher.
• The ratio of average V of the pixels in shadow to average V of the pixels not in shadow is much less than 1, e.g., 0.5 or lower.
• The ratio of Bratio to Gratio is greater than or equal to 1. This is because, when go into shadow, the blue does not drop off as quickly as the green.
• The ratio of Gratio to Rratio is greater than or equal to 1. This is because, when go into shadow, the green does not drop off as quickly as the red.
The user can be presented with options to select how much greater or less than 1 some or all of the above-defined ratios much be, thereby determining how aggressively the program matches shadowed and nonshadowed objects.
Now that it is known which objects are in the same overall object, and having determined the ratios, the shadow can be removed from the shadowed object (portion of the overall object) in optional operation 1812 by changing the pixel properties based on the ratios. For example, multiply the R value of a pixel in shadow by 1 /Rratio to get a new R value (Rnew) that does not appear to be in shadow.
Likewise, new green and blue values can be calculated for pixels in shadow by multiplying the G and B values by Gratio and Bratio, respectively. The resulting overall object will appear to not have a shadow cast on it. Note that the RGB ratios can be based on RGB values of all pixels in each object, or subsets of the pixels.
As opposed to removing the shadow, the user can also lighten and darken shadows on an object-by-object basis or in the overall image by adjusting the ratios used to calculate the new RGB values for the shadowed pixels. The user can also be allowed to manipulate the R5 G, B, H, S and/or V values of the shadowed object portion of the overall object to further refine the image. The user can also be presented with a screen that shows the identified objects, and the user can select from which objects to remove or alter shadows. Many of the features described herein can be tuned and/or turned on and off by the user. For example, the shadow edge softening can be turned off if desired.
Note also that the processes described herein, or portions thereof, may also be incorporated in an image processing software product. An image processing software product according to an embodiment provides one or more of the following features: • Slide show and movie output
• Movie mode
• Cross-compatibility with existing software products
• Batch processing of images
• Image labeling • Image search
• Matching color schemes across images
• Color replacement for changing a color of selected portions of an image
• Shadow and highlight correction for improving the contrast of over- or underexposed areas of an image • Photo touch-up including removal of redeye, dust, scratches, blemishes, wrinkles, and other flaws
• Dimensional effects such as image wrapping, stretching, curling, bending
• Remove image blurring
• Correct lens distortion • Image noise reduction
• Crop and straighten
• Compositing of 2D objects
• Painting and drawing tools
• Addition of text and labels • Creation of animations
• Multi-level undo feature
The image processing software may include menu options to find shadows, as well as allow the user to define how aggressively to find shadows. Other options presented by the image processing software may include allowing the user to define properties of shadows such as darken shadows, lighten shadows, etc. e.g., by presenting a slider to darken or lighten shadows. A field may also allow the user to set percentage of shadow shown, e.g., 50% lighter. The features and processes described herein can also be used in conjunction with software and hardware for automatic compositing of 3D objects in a still frame or series of frames, as described in copending US Patent Application entitled "Automatic Compositing of 3D Objects In a Still Frame or Series of Frames," filed October 28, 2005 under serial number 11/262,262, and which is herein incorporated by reference. While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above- described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

CLAIMSWhat is claimed is:
1. A method for generating an image, comprising the computer implemented steps of: analyzing a two dimensional (2D) image for determining several properties thereof; detecting edges in the 2D image based on the properties of the 2D image; finding objects in the 2D image based at least in part on the properties of the 2D image; adding a three dimensional (3D) object to the 2D image; estimating which objects in the 2D image are positioned in front of the 3D object, or which objects in the 2D image are positioned behind the 3D object; rendering at least one of the following effects: a reflection of one of the objects in the 2D image on an outer surface of the 3D image, a shadow cast by one of the objects in the 2D image on an outer surface of the 3D image, a representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object, and an effect of refraction on one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object.
2. A method as recited in claim 1, wherein the properties are selected from a group consisting of hue, saturation, brightness, red color content, green color content, and blue color content.
3. A method as recited in claim 1, wherein the properties include each of hue, saturation, brightness, red color content, green color content, and blue color content.
4. A method as recited in claim 1, further comprising smoothing the 2D image prior to detecting the edges in the 2D image.
5. A method as recited in claim 1, further comprising refining lines based on the detected edges in the 2D image.
6. A method as recited in claim 1, further comprising detecting shadows in the 2D image based at least in part on the properties of the 2D image.
7. A method as recited in claim 6, further comprising adjusting a boundary of the detected shadows in the 2D image based on a proximity to the detected edges.
8. A method as recited in claim I3 further comprising adjusting the shadow cast by one of the objects in the 2D image on the outer surface of the 3D image to reflect an atmospheric condition in the 2D image.
9. A method as recited in claim 1, further comprising adjusting the outer surface of the 3D image based on an effect of an atmospheric condition in the 2D image.
10. A method as recited in claim 1, wherein the objects in the 2D image are found using a flood fill algorithm.
11. A method as recited in claim 1 , wherein the objects in the 2D image are found using a histogram.
12. A method as recited in claim 1, wherein at least some of the objects in the 2D image are found using motion-based detection .
13. A method as recited in claim I3 further comprising storing leveling data, wherein the finding the objects in the 2D image uses the leveling data.
14. A method as recited in claim 1, further comprising performing radiosity processing.
15. A method as recited in claim 1, further comprising performing caustics processing.
16. A method as recited in claim 1, further comprising adding motion blur to the 3D object.
17. A method as recited in claim 1, further comprising performing depth of field focusing.
18. A method as recited in claim 1 , wherein the effects are added to the 3D image based at least in part on ray tracing.
19. A method as recited in claim I, wherein the method is performed for a series of images.
20. A method for generating a motion picture, comprising: creating a sequence of frames, each frame having a two dimensional (2D) image of a real-life scene; performing the method of claim 1 for each frame in the sequence for creating a sequence of composite images; and allowing a user to review the composite images.
21. A method as recited in claim 20, further comprising allowing the user to manually change the composite images.
22. A method for generating a composite image, comprising: performing the method of claim 1 for creating a composite image, wherein the 3D object represents a physical building construction.
23. A method for generating a medical image, comprising: performing the method of claim 1 for creating a composite image, wherein the 3D object represents a human anatomical feature.
24. A method for generating a medical image, comprising: performing the method of claim 1 for creating a composite image, wherein the 3D object represents a medical device.
25. A method as recited in claim 24, wherein the 2D image includes a real-life human anatomical feature.
26. A method as recited in claim 1 , wherein the composite image includes a landscape.
27. A method as recited in claim 1 , wherein the composite image includes an interior of a building structure.
28. A method as recited in claim 1 , further comprising allowing a user to apply masks for manipulating the image.
29. A method as recited in claim 28, further comprising allowing the user to mask a 2D object, and attaching a previously unassociated portion of the 2D image with the masked object.
30. A method as recited in claim 1 , further comprising providing the following features: slide show and movie output, batch processing of images, image labeling, matching color schemes across multiple images, color replacement for changing a color of selected portions of an image, shadow and highlight correction for improving the contrast of over- or underexposed areas of an image photo touch-up including removal of redeye, dust, scratches, blemishes, wrinkles, dimensional effects including image wrapping, stretching, curling, and bending, removing image blurring, correcting lens distortion, image noise reduction, cropping and straightening an image, painting and drawing tools, and allowing addition of text and labels.
31. A system for generating an image, comprising: logic for analyzing a two dimensional (2D) image for determining several properties thereof; logic for detecting edges in the 2D image based on the properties of the 2D image; logic for finding objects in the 2D image based at least in part on the properties of the 2D image; logic for adding a three dimensional (3D) object to the 2D image; logic for estimating which objects in the 2D image are positioned in front of the 3D object, or which objects in the 2D image are positioned behind the 3D object; logic for adding at least one of the following effects to the 3D image in a way that the at least one effect is substantially accurately represented on an outer surface of the 3D image: a reflection of one of the objects in the 2D image, a shadow cast by one of the objects in the 2D image, a representation of one of the objects in the 2D image viewable through a transparent or semi- transparent portion of the 3D object, and an effect of refraction on one of the objects in the 2D image viewable through a transparent or semi- transparent portion of the 3D obj ect.
32. A computer program product, comprising: a computer readable medium having computer code thereon for generating an image, the computer code including: computer code for analyzing a two dimensional (2D) image for determining several properties thereof; computer code for detecting edges in the 2D image based on the properties of the 2D image; computer code for finding objects in the 2D image based at least in part on the properties of the 2D image; computer code for adding a three dimensional (3D) object to the 2D image; computer code for determining which objects in the 2D image are positioned in front of the 3D object, or which objects in the 2D image are positioned behind the 3D object; computer code for adding at least one of the following effects to the 3D image in a way that the at least one effect is substantially accurately represented on an outer surface of the 3D image: a reflection of one of the objects in the 2D image, a shadow cast by one of the objects in the 2D image, a representation of one of the objects in the 2D image viewable through a transparent or semi- transparent portion of the 3D object, and an effect of refraction on one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object.
33. A method for generating an image, comprising the computer implemented steps of: adding a three dimensional (3D) object to a two dimensional (2D) image; rendering a reflection of one of the objects in the 2D image on an outer surface of the 3D image; rendering a representation of one of the objects in the 2D image viewable through a transparent or semi-transparent portion of the 3D object; and rendering an effect of refraction on one of the objects in the 2D image viewable through the transparent or semi-transparent portion of the 3D object.
34. A method as recited in claim 33, further comprising allowing a user to apply masks for manipulating the image.
35. A method as recited in claim 34, further comprising allowing the user to mask a 2D object, and attaching a previously unassociated portion of the 2D image with the masked object.
36. A method for generating a motion picture, comprising: creating a sequence of frames, each frame having a two dimensional (2D) image of a real-life scene; performing the method of claim 33 for each frame in the sequence for creating a sequence of composite images; and allowing a user to review the composite images.
37. A method as recited in claim 36, further comprising allowing the user to manually change the composite images.
38. A method for generating a composite image, comprising: performing the method of claim 33 for creating a composite image, wherein the 3D object represents a physical building construction.
39. A method for generating a medical image, comprising: performing the method of claim 33 for creating a composite image, wherein the 3D object represents a human anatomical feature.
40. A method for generating a medical image, comprising: performing the method of claim 33 for creating a composite image, wherein the 3D object represents a medical device.
41. A method as recited in claim 30, wherein the 2D image includes a real-life human anatomical feature.
42. A method as recited in claim 33, wherein the composite image includes a landscape.
43. A method as recited in claim 33, wherein the composite image includes an interior of a building structure.
44. A method as recited in claim 33, further comprising providing the following features: slide show and movie output, batch processing of images, image labeling, matching color schemes across multiple images, color replacement for changing a color of selected portions of an image, shadow and highlight correction for improving the contrast of over- or underexposed areas of an image photo touch-up including removal of redeye, dust, scratches, blemishes, wrinkles, dimensional effects including image wrapping, stretching, curling, and bending, removing image blurring. correcting lens distortion, image noise reduction, cropping and straightening an image, painting and drawing tools, and allowing addition of text and labels.
45. A method for generating an image, comprising the computer implemented steps of: analyzing a two dimensional (2D) image for determining several properties thereof, wherein the properties are selected from a group consisting of hue, saturation, brightness, red color content, green color content, and blue color content; smoothing the 2D image; detecting edges in the 2D image based on the properties of the 2D image; refining lines based on the detected edges in the 2D image; detecting shadows in the 2D image based at least in part on the properties of the 2D image; finding objects in the 2D image based at least in part on the properties of the 2D image; adding a three dimensional (3D) object to the 2D image; estimating which objects in the 2D image are positioned in front of the 3D object, or which objects in the 2D image are positioned behind the 3D object; rendering the following effects: a reflection of one of the objects in the 2D image on an outer surface of the 3D image, a shadow cast by one of the objects in the 2D image on an outer surface of the 3D image, a representation of one of the objects in the 2D image viewable through a transparent or semi- transparent portion of the 3D object, and an effect of refraction on one of the objects in the 2D image viewable through a transparent or semi- transparent portion of the 3D object.
46. A method as recited in claim 45, further comprising adjusting a boundary of the detected shadows in the 2D image based on a proximity to the detected edges.
47. A method as recited in claim 46, further comprising adjusting the shadow cast by one of the objects in the 2D image on the outer surface of the 3D image to reflect an atmospheric condition in the 2D image.
48. A method as recited in claim 47, further comprising adjusting the outer surface of the 3D image based on an effect of an atmospheric condition in the 2D image.
49. A method as recited in claim 48, further comprising storing leveling data, wherein the finding the objects in the 2D image uses the leveling data.
50. A method as recited in claim 49, further comprising performing radiosity processing, caustics processing, adding motion blur to the 3D object, and performing depth of field focusing.
51. A method as recited in claim 45 , wherein the method is performed for a series of images.
52. A method for identifying pixels in shadow in an image, comprising: analyzing an image for determining several properties thereof, the properties including hue (H), saturation (S), brightness (V), red color content (R)3 green color content (G), and blue color content (B); creating a first histogram of H/S values calculated for each pixel in the image; creating a second histogram of S/V values calculated for each pixel in the image; identifying a line feature in the first histogram; identifying a line feature in the second histogram; and marking pixels having an H/S value above the line feature in the first histogram and an S/V value above the line feature in the second histogram as being in shadow.
53. A method as recited in claim 52, further comprising storing leveling data, wherein the finding the objects in the image uses the leveling data.
54. A method as recited in claim 52, further comprising smoothing the image and detecting edges in the image.
55. A method as recited in claim 52, further comprising detecting edges in the image and refining lines based on the detected edges in the image.
56. ( A method as recited in claim 52, further comprising detecting edges in. the image and adjusting a boundary of the detected shadows in the image based on a proximity to the detected edges.
57. A method as recited in claim 52, wherein marking the pixels as being in shadow further comprises: flagging pixels having an H/S value below the line feature in the first histogram as potentially not being in shadow; flagging pixels having an H/S value above the line feature in the first histogram as potentially being in shadow; flagging pixels having an S/V value below the line feature in the second histogram as potentially not being in shadow; flagging pixels having an S/V value above the line feature in the second histogram as potentially being in shadow; and marking pixels having two flags as being in shadow.
58. A method as recited in claim 52, wherein the method is performed for a series of images.
59. A computer program product, comprising: a computer readable medium having computer code thereon for processing an image, the computer code including: computer code for analyzing an image for determining several properties thereof, the properties including hue (H), saturation (S), brightness (V), red color content (R), green color content (G), and blue color content (B); computer code for creating a first histogram of H/S values calculated for each pixel in the image; computer code for creating a second histogram of S/V values calculated for each pixel in the image; computer code for identifying a line feature in the first histogram; computer code for identifying a line feature in the second histogram; and computer code for marking pixels having an H/S value above the line feature in the first histogram and an S/V value above the line feature in the second histogram as being in shadow.
60. A computer program product, comprising: a computer readable medium having computer code thereon for processing an image, the computer code including: computer code for generating slide show and movie output; computer code for performing batch processing of an image; computer code for allowing image labeling; computer code for matching color schemes across multiple images; computer code for performing color replacement for changing a color of selected portions of an image; computer code for performing shadow and highlight correction for improving contrast of over- or under-exposed areas of an image; computer code for performing photo touch-up functions including removal of redeye, dust, scratches, blemishes, wrinkles; computer code for performing dimensional effects on an image including image wrapping, stretching, curling, and bending; computer code for removing image blurring; computer code for correcting lens distortion; computer code for image noise reduction; computer code for cropping and straightening an image; computer code for providing painting and drawing tools; computer code for addition of text and labels; and computer code for manipulating shadows in an image.
61. A method for manipulating shadows of an image, comprising: analyzing an image for determining several properties thereof; determining which pixels of the image are in shadow; and performing at least one of shadow boundary adjustment and shadow boundary softening of the image for a group of pixels determined to be in shadow, an outer periphery of the pixels determined to be in shadow defining the shadow boundary.
62. A method as recited in claim 61 , wherein the properties include hue (H), saturation (S), and brightness (V); wherein determining which pixels of the image are in shadow further comprises creating a first histogram of H/S values calculated for each pixel in the image; creating a second histogram of S/V values calculated for each pixel in the image; identifying a line feature in the first histogram; identifying a line feature in the second histogram; and marking pixels having an H/S value above the line feature in the first histogram and an S/V value above the line feature in the second histogram as being in shadow.
63. A method as recited in claim 61 , further comprising detecting edges in the image and performing shadow boundary adjustment, wherein the shadow boundary adjustment includes adjusting a boundary of the group of pixels in shadow based on a proximity to the detected edges.
64. A method as recited in claim 63, wherein the proximity to the detected edge is based on a predefined number of pixels.
65. A method as recited in claim 63, wherein the proximity to the detected edge is based on a percentage of total pixels in the image.
66. A method as recited in claim 63, wherein the proximity to the detected edge is based on a percentage of horizontal pixel width of the image.
67. A method as recited in claim 63, wherein the proximity to the detected edge is based on a percentage of vertical pixel height of the image.
68. A method as recited in claim 61, wherein shadow boundary softening is performed, wherein the shadow boundary softening includes adjusting at least one of the properties of pixels along the boundary of the shadow based on a linear interpolation between properties of pixels in shadow and properties of pixels not in shadow.
69. A method as recited in claim 61, wherein shadow boundary softening is performed, wherein the shadow boundary softening includes analyzing pixels positioned towards the shadow boundary, and adjusting at least one of the properties of the pixels being analyzed based on an average of similar properties of pixels surrounding each pixel being analyzed.
70. A method as recited in claim 61, further comprising providing the following features: slide show and movie output of multiple images, batch processing of multiple images, image labeling, matching color schemes across multiple images, color replacement for changing a color of selected portions of an image, shadow and highlight correction for improving the contrast of over- or underexposed areas of an image photo touch-up including removal of redeye, dust, scratches, blemishes, wrinkles, dimensional effects including image wrapping, stretching, curling, and bending, removing image blurring, correcting lens distortion, image noise reduction, cropping and straightening an image, painting and drawing tools, and allowing addition of text and labels.
71. A method as recited in claim 61 , wherein the method is performed for a series of images.
72. A computer program product, comprising: a computer readable medium having computer code thereon for processing an image, the computer code including: computer code for analyzing an image for determining several properties thereof; computer code for determining which pixels of the image are in shadow; and computer code for performing at least one of shadow boundary adjustment and shadow boundary softening of the image for a group of pixels determined to be in shadow, an outer periphery of the pixels determined to be in shadow defining the shadow boundary.
73. A method for identifying an object in an image, comprising: analyzing an image for determining several properties thereof; finding objects in the image based at least in part on the properties of the image; selecting a first object determined to be at least primarily in shadow; selecting a second object adjacent the first object; analyzing the two objects using the properties of the image for determining whether the two objects are part of a same overall object.
74. A method as recited in claim 73, wherein the objects in the image are found using a flood fill algorithm.
75. A method as recited in claim 73, wherein the objects in the image are found using a histogram.
76. A method as recited in claim 73, wherein the method is performed for a series of images.
77. A method as recited in claim 16, wherein at least some of the objects in the image are found using motion-based detection,
78. A method as recited in claim 73, wherein the second object is not in shadow.
79. A method as recited in claim 73, wherein analyzing the two objects includes: comparing a ratio of an average red (R) value of pixels in the first object to an average R value of pixels in the second object against a predetermined value; comparing a ratio of an average green (G) value of pixels in the first object to an average G value of pixels in the second object against a predetermined value; comparing a ratio of an average blue (B) value of pixels in the first object to an average B value of pixels in the second object against a predetermined value; comparing a ratio of an average saturation (S) value of the first object to an average S value of the second object against a predetermined value; comparing a ratio of an average brightness (V) value of the first object to an average V value of the second object against a predetermined value; wherein if the comparisons of the ratios to the predetermined values match predefined criteria, the first and second objects are determined to be in the same overall object.
80. A computer program product, comprising: a computer readable medium having computer code thereon for processing an image, the computer code including: computer code for analyzing an image for determining several properties thereof; computer code for finding objects in the image based at least in part on the properties of the image; computer code for determining whether one of the objects of the image is at least primarily in shadow; computer code for selecting a first object determined to be at least primarily in shadow; computer code for selecting a second object adjacent the first object; computer code for analyzing the two objects using the properties of the image for determining whether the two objects are part of a same overall object.
81. A method for removing shadows from an image, comprising: analyzing an image for determining several properties thereof; finding objects in the image based at least in part on the properties of the image; selecting a first object determined to be at least primarily in shadow; selecting a second object adjacent the first object, the second object not being primarily in shadow; analyzing the two objects using the properties of the image for determining whether the two objects are part of a same overall object; calculating new red (R) values for pixels in the first object based on a ratio of an average R value of the pixels in the first object to an average R value of pixels in the second object; calculating new green (G) values for pixels in the first object based on a ratio of an average G value of the pixels in the first object to an average G value of pixels in the second object; and calculating new blue (B) values for pixels in the first object based on a ratio of an average B value of the pixels in the first object to an average B value of pixels in the second object.
82. A method as recited in claim 81 , wherein the objects in the image are found using a flood fill algorithm.
83. A method as recited in claim 81, wherein the objects in the image are found using a histogram.
84. A method as recited in claim 81, wherein the method is performed for a series of images.
85. A method as recited in claim 84, wherein at least some of the objects in the image are found using motion-based detection .
86. A method as recited in claim 81, further comprising allowing user manipulation of a hue (H) of the fiTSt object.
87. A method as recited in claim 81, further comprising allowing user manipulation of a saturation (S) of the first object.
88. A method as recited in claim 81, further comprising allowing user manipulation of a brightness (V) of the first object.
89. A computer program product, comprising: a computer readable medium having computer code thereon for processing an image, the computer code including: computer code for finding objects in the image based at least in part on the properties of the image; computer code for determining whether one of the objects of the image is at least primarily in shadow; computer code for selecting a first object determined to be at least primarily in shadow; computer code for selecting a second object adjacent the first object, the second object not being primarily in shadow; computer code for analyzing the two objects using the properties of the image for determining whether the two objects are part of a same overall object; computer code for calculating new red (R) values for pixels in the first object based on a ratio of an average R value of the pixels in the first object to an average R value of pixels in the second object; computer code for calculating new green (G) values for pixels in the first object based on a ratio of an average G value of the pixels in the first object to an average G value of pixels in the second object; and calculating new blue (B) values for pixels in the first object based on a ratio of an average B value of the pixels in the first object to an average B value of pixels in the second object.
PCT/US2006/040855 2005-10-28 2006-10-17 Automatic compositing of 3d objects in a still frame or series of frames and detection and manipulation of shadows in an image or series of images WO2007145654A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/262,262 2005-10-28
US11/262,262 US7477777B2 (en) 2005-10-28 2005-10-28 Automatic compositing of 3D objects in a still frame or series of frames
US11/271,532 2005-11-09
US11/271,532 US7305127B2 (en) 2005-11-09 2005-11-09 Detection and manipulation of shadows in an image or series of images

Publications (1)

Publication Number Publication Date
WO2007145654A1 true WO2007145654A1 (en) 2007-12-21

Family

ID=38832042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/040855 WO2007145654A1 (en) 2005-10-28 2006-10-17 Automatic compositing of 3d objects in a still frame or series of frames and detection and manipulation of shadows in an image or series of images

Country Status (1)

Country Link
WO (1) WO2007145654A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120050484A1 (en) * 2010-08-27 2012-03-01 Chris Boross Method and system for utilizing image sensor pipeline (isp) for enhancing color of the 3d image utilizing z-depth information
US8971617B2 (en) 2012-03-06 2015-03-03 Apple Inc. Method and interface for converting images to grayscale
US9105121B2 (en) 2012-03-06 2015-08-11 Apple Inc. Image editing with user interface controls overlaid on image
US9131192B2 (en) 2012-03-06 2015-09-08 Apple Inc. Unified slider control for modifying multiple image properties
US9202433B2 (en) 2012-03-06 2015-12-01 Apple Inc. Multi operation slider
WO2017177284A1 (en) * 2016-04-15 2017-10-19 University Of Southern Queensland Methods, systems, and devices relating to shadow detection for real-time object identification
US10282055B2 (en) 2012-03-06 2019-05-07 Apple Inc. Ordered processing of edits for a media editing application
US10552016B2 (en) 2012-03-06 2020-02-04 Apple Inc. User interface tools for cropping and straightening image
AU2018250354B2 (en) * 2016-04-15 2020-11-19 Deere And Company Method and System for Extracting Shadows from an Image during Optical Based Selective Treatment of an Agricultural Field

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
AGARWAL ET AL.: "Learning a Sparse Representation for Object Detection", COMPUTER VISION - ECCV 2002: 7TH EUROPEAN CONFERENCE ON COMPUTER VISION, COPENHAGEN, DENMARK, PROCEEDINGS, PART IV, 28 May 2002 (2002-05-28) - 31 May 2002 (2002-05-31) *
CUCCHIARA ET AL.: "Detecting Moving Objects, Ghosts, and Shadows in Video Streams", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 25, no. 10, October 2003 (2003-10-01) *
CUCCHIARA R. ET AL.: "Detecting, Objects, Shadows and Ghosts in Video Streams by Exploiting Color and Motion Information", IMAGE ANALYSIS AND PROCESSING, 2001. PROCEEDINGS. 11TH INTERNATIONAL CONFERENCE, 26 September 2001 (2001-09-26) - 28 September 2001 (2001-09-28), pages 360 - 365 *
HALLER M. ET AL.: "A real-time shadow approach for an Augmented Reality application using shadow volumes", ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE, 2003 *
JACOBS J.D. ET AL.: "Automatic generation of Consistent Shadows for Augmented Reality", PROCEEDINGS OF THE 2005 CONFERENCE ON GRAPHICS INTERFACE, May 2005 (2005-05-01) *
LI N. ET AL.: "Real-Time Video Object Segmentation Using HSV Space", IMAGE PROCESSING. 2002. PROCEEDINGS. 2002 INTERNATIONAL CONFERENCE *
LOSCOS C. ET AL.: "Interactive Virtual Relighting of Real Scenes", IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, vol. 6, no. 4, December 2000 (2000-12-01) *
POMI A. ET AL.: "Streaming Video Textures for Mixed Reality Applications in Interactive Ray Tracing Environments", PROCEEDINGS OF VIRTUAL REALITY, MODELLING AND VISUALIZATION, 2003 *
WLOKA ET AL.: "Interactive real-time motion blur", THE VISUAL COMPUTER, vol. 12, no. 6, June 1996 (1996-06-01) *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9100640B2 (en) * 2010-08-27 2015-08-04 Broadcom Corporation Method and system for utilizing image sensor pipeline (ISP) for enhancing color of the 3D image utilizing z-depth information
US20120050484A1 (en) * 2010-08-27 2012-03-01 Chris Boross Method and system for utilizing image sensor pipeline (isp) for enhancing color of the 3d image utilizing z-depth information
US9886931B2 (en) 2012-03-06 2018-02-06 Apple Inc. Multi operation slider
US10282055B2 (en) 2012-03-06 2019-05-07 Apple Inc. Ordered processing of edits for a media editing application
US9105121B2 (en) 2012-03-06 2015-08-11 Apple Inc. Image editing with user interface controls overlaid on image
US9131192B2 (en) 2012-03-06 2015-09-08 Apple Inc. Unified slider control for modifying multiple image properties
US9159144B2 (en) 2012-03-06 2015-10-13 Apple Inc. Color adjustors for color segments
US9202433B2 (en) 2012-03-06 2015-12-01 Apple Inc. Multi operation slider
US9299168B2 (en) 2012-03-06 2016-03-29 Apple Inc. Context aware user interface for image editing
US11481097B2 (en) 2012-03-06 2022-10-25 Apple Inc. User interface tools for cropping and straightening image
US8971617B2 (en) 2012-03-06 2015-03-03 Apple Inc. Method and interface for converting images to grayscale
US9092893B2 (en) 2012-03-06 2015-07-28 Apple Inc. Method and interface for converting images to grayscale
US10545631B2 (en) 2012-03-06 2020-01-28 Apple Inc. Fanning user interface controls for a media editing application
US10552016B2 (en) 2012-03-06 2020-02-04 Apple Inc. User interface tools for cropping and straightening image
US11119635B2 (en) 2012-03-06 2021-09-14 Apple Inc. Fanning user interface controls for a media editing application
US10942634B2 (en) 2012-03-06 2021-03-09 Apple Inc. User interface tools for cropping and straightening image
US10936173B2 (en) 2012-03-06 2021-03-02 Apple Inc. Unified slider control for modifying multiple image properties
AU2018250354B2 (en) * 2016-04-15 2020-11-19 Deere And Company Method and System for Extracting Shadows from an Image during Optical Based Selective Treatment of an Agricultural Field
US10740610B2 (en) 2016-04-15 2020-08-11 University Of Southern Queensland Methods, systems, and devices relating to shadow detection for real-time object identification
WO2017177284A1 (en) * 2016-04-15 2017-10-19 University Of Southern Queensland Methods, systems, and devices relating to shadow detection for real-time object identification

Similar Documents

Publication Publication Date Title
US7305127B2 (en) Detection and manipulation of shadows in an image or series of images
US7889913B2 (en) Automatic compositing of 3D objects in a still frame or series of frames
US11386528B2 (en) Denoising filter
Paris A gentle introduction to bilateral filtering and its applications
WO2007145654A1 (en) Automatic compositing of 3d objects in a still frame or series of frames and detection and manipulation of shadows in an image or series of images
US8289318B1 (en) Determining three-dimensional shape characteristics in a two-dimensional image
RU2368006C1 (en) Method and system for adaptive reformatting of digital images
US8041140B1 (en) Healing by texture synthesis in differential space
EP1372109A2 (en) Method and system for enhancing portrait images
US7577313B1 (en) Generating synthesized texture in differential space
WO2001026050A2 (en) Improved image segmentation processing by user-guided image processing techniques
Aliaga et al. A virtual restoration stage for real-world objects
Banerjee et al. In-camera automation of photographic composition rules
US11348303B2 (en) Methods, devices, and computer program products for 3D texturing
CN111681198A (en) Morphological attribute filtering multimode fusion imaging method, system and medium
CN112288726B (en) Method for detecting foreign matters on belt surface of underground belt conveyor
Pan et al. Color adjustment in image-based texture maps
CN109448010B (en) Automatic four-side continuous pattern generation method based on content features
Zhang et al. Region-adaptive texture-aware image resizing
Kim et al. Nonlinear operators for edge detection and line scratch removal
Schumacher et al. Hallucination of facial details from degraded images using 3D face models
Melendez et al. Transfer of albedo and local depth variation to photo-textures
Kim et al. Region removal and restoration using a genetic algorithm with isophote constraint
KR102606373B1 (en) Method and apparatus for adjusting facial landmarks detected in images
Suciati et al. Converting image into bas reliefs using image processing techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06851286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC, EPO FORM 1205A, 02/09/2008

122 Ep: pct application non-entry in european phase

Ref document number: 06851286

Country of ref document: EP

Kind code of ref document: A1