WO2011029209A2 - Method and apparatus for generating and processing depth-enhanced images - Google Patents

Method and apparatus for generating and processing depth-enhanced images Download PDF

Info

Publication number
WO2011029209A2
WO2011029209A2 PCT/CH2010/000218 CH2010000218W WO2011029209A2 WO 2011029209 A2 WO2011029209 A2 WO 2011029209A2 CH 2010000218 W CH2010000218 W CH 2010000218W WO 2011029209 A2 WO2011029209 A2 WO 2011029209A2
Authority
WO
WIPO (PCT)
Prior art keywords
scene
image
elements
depth
information
Prior art date
Application number
PCT/CH2010/000218
Other languages
French (fr)
Other versions
WO2011029209A3 (en
Inventor
Christoph Niederberger
Stephan Würmlin Stadler
Richard Keiser
Remo Ziegler
Marco Feriencik
Marcel Germann
Marcel Müller
Original Assignee
Liberovision Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liberovision Ag filed Critical Liberovision Ag
Publication of WO2011029209A2 publication Critical patent/WO2011029209A2/en
Publication of WO2011029209A3 publication Critical patent/WO2011029209A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Definitions

  • the invention relates to the field of digital image processing, and in particular to a method as described in the preamble of the independent claims.
  • Annotating sports scenes for example on television can be done either
  • Stereoscopic displays two views are generated (from where the viewer's eyes are expected) and
  • o Lenticular sheets cylindrical vertical lenses refract light from display into the different views
  • a full 3D immersive perception of annotations is not given with the above methods.
  • the 3D effect is imperfect due to wrong occlusion effects. Consequently, elements of the scene and annotation elements are not displayed in a consistent spatial relationship.
  • the virtual elements are inserted into a scene such that it looks as if they are actually a part of the scene. See the following two examples (Fig. 2a and 2b) of the desired rendering of spatial relationships.
  • the cylindrical objects represent objects (e.g. players in a sports event) of the real or virtual scene
  • the arrows are examples for annotation elements.
  • the scene or at least one image of the scene is assumed to be provided, and the annotations are to be inserted in to the scene such that they appear in a realistic fashion.
  • the annotation elements (typically modeled as 3D computer graphics objects) are perceived by a user, by means of a 2D or 3D display device, as 3D objects or as surfaces located in 3D space and being part of the scene:
  • Fig. 2a The arrow is in front of the middle object and goes around the left object (partly in front, partly behind).
  • Fig. 2b The arrow passes in between the objects, in front of the middle one and behind the other ones.
  • novel 3D displays create the feeling of a three-dimensional picture by implying a notion of 3D/depth to the viewer. Such displays are especially suited for the above described content.
  • One object of the invention is to generate such additional information for a 3D display, in particular for known 3D display types as described in the above.
  • a method for generating and processing depth-enhanced images comprising the steps of
  • rendering by means of a rendering processor, at least one rendered image as seen by the at least one viewing camera, wherein the one or more annotation elements are shown in a spatially consistent relation to the scene elements.
  • the at least one rendered image is displayed on a 2D or 3D-display.
  • the inserted annotation elements appear, according to their location in 3D space, to lie behind or in front of the scene elements.
  • the depth-enhanced representation of the scene comprises, for each pixel of the entire image or of part of the image, distance ordering information and wherein the relative position of scene elements and annotation elements is expressed by their distance ordering information.
  • the distance ordering information is one of a distance to the viewing camera and a relative ordering according to distance to the viewing camera. The relative ordering of distance indicates which of two arbitrarily selected pixels is associated with an object that is closer to the virtual camera.
  • the depth-enhanced representation of the scene comprises a mapping of segments of the image onto scene elements of a 3D representation of the scene and the relative position of scene elements and annotation elements is expressed by their location in the 3D representation of the scene.
  • the one or more further observation devices comprise one or more of distance measuring scanners, further physical cameras, position determining systems.
  • two or more viewing cameras are defined, and an image is rendered for each of the viewing cameras, resulting in a pair of stereoscopic images or a group of multiscopic images; and optionally in a video sequence of pairs or groups of images.
  • the camera calibration parameters for the viewing camera can be set to be identical to the calibration parameters of an existing camera, in particular the one that provided the original image.
  • the parameters of the viewing camera can be modified interactively by a user, generating a virtual view.
  • the method in the step of inputting one or more annotation elements, comprises the step of computing a 3D intersection of these annotation elements with scene elements, detecting an intersection and optionally indicating to a user that an intersection has been detected, and/or optionally also automatically correcting the shape or the position of the annotation element such that no intersection occurs, e.g. by stretching an annotation element to pass around a scene element in the virtual 3D space.
  • the step of inputting information on the relative position of the one or more annotation elements comprises the step of interpreting the position inputted by the input device as indicating a position on a ground plane in the scene.
  • the step of inputting information on the relative position of the one or more annotation elements comprises the step of interpreting the position inputted by the input device as indicating a position on a plane parallel to and at a given height above a ground plane in the scene.
  • the height above the ground plane is controllable by means of an additional input device or input parameter such as a scroll wheel, keyboard keys, pen pressure.
  • a method for generating and processing depth-enhanced images comprising the steps of
  • the step of rendering and displaying the depth-enhanced representation of the scene comprises the steps of
  • the method comprises the step of inputting, by means of an input device, one or more annotation elements and information on the relative position of the one or more annotation elements with regard to the scene elements; and, in the step of rendering and displaying the depth-enhanced representation of the scene, comprises including the annotation elements.
  • the image is received through a broadcast, the broadcast further comprising at least one of camera calibration information and a color model, and comprising the steps of performing the subsequent computation, rendering and displaying of the depth-enhanced representation by a receiver of the broadcast, based on the image and at least one of the calibration information and the color model.
  • An apparatus for generating and processing depth-enhanced images comprises a depth analysis processor and a rendering processor configured to perform the method steps of the method according to one of the preceding claims.
  • the apparatus comprises a receiving unit configured to receive a broadcast, the broadcast comprising at least one image and further comprising at least one of calibration information and a color model, and the depth analysis processor being configured to compute, from the image and at least one of the calibration information and the color model, a depth-enhanced representation of the scene, the depth- enhanced representation comprising information on the relative position of scene elements in 3D space.
  • a computer program for generating and processing depth-enhanced images is loadable into an internal memory of a digital computer, and comprises computer program code means to make, when said computer program code means is loaded in the computer, the computer execute the method according to the invention.
  • a computer program product comprises a computer readable medium, having the computer program code means recorded thereon.
  • the computer readable medium preferably is non-transitory, that is, tangible.
  • the computer program is embodied or encoded as a reproducible computer- readable signal, and thus can be transmitted in the form of such a signal.
  • An important application of the invention lies in the field of processing, annotating and/or displaying still images and video images from sports events.
  • one or more of the following points are valid, and the information according to each point can, but must not necessarily, according to different embodiments of the invention, be used to segment an image and optionally also to provide distance information: • the action takes place on a flat playing field or track (summarily called "playing field").
  • a field may have a non-flat but known topography (as e.g. a golf course or dirt bike track).
  • scene elements players or participants move, most of the time, on the surface of the playing field. Some scene elements (balls) do not.
  • scene elements can be distinguished from the playing field by colour (segmentation, chromakeying).
  • camera calibration parameters typically the relative position and orientation of the real or virtual camera with respect to the scene, and optical parameters of the camera
  • camera calibration parameters are known or are automatically computed from an image of the playing field and from the information about the location of the markers.
  • the blob image is projected onto a vertical surface ("painted on a billboard") standing at the location of these pixels.
  • a view as seen from a virtual camera location is generated by rendering the scene comprising the background field with the painted billboards standing on it.
  • Other implementations can use, as scene elements, more detailed 3D surfaces carrying the blob images, that is, the blob images are projected onto the 3D surfaces.
  • information e.g. depth information from distance scanners and/or images from additional cameras and/or from images slightly earlier or later in time, information according to one or more of the points listed above may not be required.
  • the information about the relative spatial location of the scene elements is used to insert annotation elements such that they appear, according to their 3D location, behind or in front of the scene elements.
  • the information about the relative spatial location of the scene elements is used to generate image data for driving a 3D TV display.
  • this spatial location information can be derived from the image with only little additional information (camera calibration and/or color model)
  • an ordinary TV image stream can be enhanced to drive a 3D display, and the enhancement can take place at the TV receiver itself. There is only a very small additional load on communication.
  • the term "camera” is used for the sake of convenience. It is however to be understood, that the term may stand for any sensing device providing "image" information on a scene, where an image is a 2D or enhanced representation of the scene.
  • the image is a 2D camera image, in more sophisticated implementations of the invention it is a depth-enhanced, e.g. 2.5D or "pseudo 3D" representation including depth information obtained by a distance measurement device such as a laser scanner, or by a stereoscopic (or multiscopic) system using two or more cameras and providing "depth from stereo" information as a starting point for the implementation of the present invention.
  • 3D-display or “3D output device”
  • 3D output device any kind of display device that evokes, in a user, the perception of depth, be it on a screen or in 3D space.
  • annotation elements are typically defined by a user by means of a graphical input device.
  • Basic 3D shapes of annotation elements (Arrows, circles) can be predefined and only stretched and positioned by a user. Entire sets of annotation elements may also be predefined and retrieved from storage for manipulation or adjustment by a user or automatically.
  • annotation elements are generated and/or positioned automatically, for example an "offside wall" computed from player positions, or trajectories of players across the field, determined by means of motion tracking of the players.
  • the method for generating and processing depth-enhanced images comprises the following steps.
  • This step comprises
  • annotation element can be labeled with an ordering label according to the user's wish, that is, the user defines only the relative placement of the annotation element with regard to the scene elements.
  • step (i) involves an interaction (a.k.a. telestration) to define the virtual element and its position/depth in the image.
  • step (i) and (iii) in 3D avoids penetrating the objects (scene elements) with the virtual elements.
  • An "empty background” image for one or all of the input images can be created by removing the "foreground objects” (that is, scene elements such as players, or the ball).
  • the additional step Deriving warped or otherwise transformed images depicting the scene from a different viewpoint (for example for stereo- or multiscopic displays), based on the input image(s) and based on
  • step (b) the combined depth map of both the input image(s) and the annotations is used.
  • step (b) the combined depth map of both the input image(s) and the annotations is used.
  • This solution can, for example, be used to broadcast/transmit only the picture, calibration information, and optionally a color model (for each frame or with regular updates or over another channel) to the receiver (TV) which generates the 3D picture itself (instead of transmitting picture and depth or also the "background” and its depth - resulting in either a lower resolution or higher bandwidth requirements).
  • the calibration information specifies the relative position and orientation of the real or virtual camera (i.e. the viewpoint from which the picture is taken) with respect to the scene, and optical parameters of the camera specifying the mapping of the scene onto the camera picture.
  • the color model typically specifies several sets of one or more color ranges, wherein each set of one or more color ranges defines, e.g., the playing field, other background elements, players of one team, players of the other team, etc. Based on this information, the receiver is able to segment the 2D image, model the 3D relation of the playing field with respect to the camera, and determine the location of the players (scene elements) on the playing field. This gives the information required to display the scene on the 3D-display.
  • scene a collection of 3D objects in a 3D environment.
  • the scene may be from the real world, or may be modeled by means of a computer ("virtual scene").
  • the 3D objects that constitute the scene are also called “scene elements” view: a 2D or 2.SD or 3D representation of a scene, as seen from a particular viewpoint in 3D space.
  • the view may be generated by a real camera in the real world, or by a virtual camera in a virtual scene.
  • view expresses the fact that there is a 3D context from which the view is derived. Once a view is generated, and in the context of a display, it may also be considered to be an image.
  • the term "scene” is used in lieu of "view”, since even in the case when a view is manipulated and an annotation is inserted into the view, (without a complete 3D model of the scene existing), conceptually the annotation is considered to be inserted into the scene.
  • depth ordering information also called 2.5D view, that is, two-and-a-half-dimensional view, or pseudo-3D.
  • a “real camera” is a physical camera capturing images from the real world.
  • the parameters defining the pose (position and orientation) and optical characteristics (focal length, field of view, zoom factor etc.) of the camera are called calibration parameters, since they represent real variables in the real world that usually are determined by a calibration process.
  • a “hypothetical camera” is related to a real camera whose calibration parameters are not known exactly, or not known at all. In such a case the image from the corresponding real camera is processed as if the calibration parameters of the real camera were those of the hypothetical camera.
  • a “virtual camera” is a conceptual entity, defined by a set of camera parameters, which are of the same type as the above calibration parameters, but are predetermined or computed, rather than being calibrated to match the real world.
  • the virtual camera is used to render images from elements in a 3D scene.
  • rendering is commonly used in 3D computer graphics to denote the computation of a 2D image from a 3D scene.
  • 3D display device one may also say that a 3D scene is rendered on the display device, be it via 2D images or by a rendering process that does not require 2D images.
  • a "viewing camera” may be a virtual camera, used to define a computer generated view, or a real camera, whose captured image of the real scene is enhanced by the virtual annotation elements and/or by creating the depth-enhanced representation of the scene.
  • the annotation elements are computed and rendered from the point of view of a virtual camera having parameters corresponding to real camera, in order to insert the annotation elements correctly (with regard to perspective and visibility) into the depth-enhanced representation.
  • annotation a graphic element, which conceptually is a 3D object located in 3D space, and which is inserted into an existing scene. Usually this insertion, i.e. the act of annotating, is initiated and/or controlled by a human user.
  • annotation elements may be called simply "virtual elements", as opposed to elements of the scene that represent real world objects such as players in a sports scene, and incorporate image and/or position information from these real world objects.
  • Scene elements representing the real world objects may be inserted, according to their real position, and/or with their associated real image data into a 3D model and then rendered again for viewing on a display.
  • Such scene elements shall also be considered “real”.
  • the scene may also be generated in a purely virtual manner, that is, without image information from a real, static or dynamically unfolding recorded or live event.
  • component an element of an image, depending on the manner in which the image is represented. Typically, a component corresponds to a pixel.
  • connected component a number of components that are considered together as a unit. In the context of an image, such a unit may be called a blob. In the context of the scene, such a unit often corresponds to an object.
  • Fig 2a-b issues involved in 3D graphic annotations
  • Fig. 3 a system for recording, processing and displaying depth-enhanced images
  • Fig. 4 different stages in image processing
  • Fig. 5 a flow diagram of a method for enhancing images by creating a 3D-enhanced representation incorporating annotations
  • Fig. 6 a flow diagram of a method for enhancing images by creating a 3D-enhanced representation for displaying it or rendering it on a 3D display device;
  • Fig. 7 an apparatus for receiving image information and for generating and displaying depth-enhanced images.
  • Fig. 3 shows, schematically, a system for recording, processing and displaying depth- enhanced images.
  • a physical camera 1 and further observation devices are arranged to observe a scene 10 comprising a background 11 and objects 12.
  • the further observation devices may be further physical camera(s) 2a, position determining system(s) 2b, distance scanner(s) 2b.
  • a virtual camera 3 is characterized by calibration data (position, orientation, optical parameters), and the system is configured to generate a view of the scene 10 as it would be seen by the virtual camera 3.
  • the system further comprises a computer-readable storage means 4, a data processing unit programmed to operate as a depth analysis processor 5, one or more user interface devices such as a display 6a and an input device 6b (such as pointing device, keyboard, and the like), a data processing unit programmed to operate as a rendering processor 7, and a 2D or 3D display 8.
  • Fig. 4 shows, schematically, different stages in image processing as performed by the system: a) original image with stadium background 21, playing field background 22 and players 23 (in a highly schematic representation). b) images segmented into background and different scene elements. Each player is represented by an individual segment, i.e. a blob of pixels. Overlapping players (not shown) may be represented by just one blob comprising the pixels corresponding to both players. c) annotation element 24 in a desired 2D position in the image b) scene and annotation element 24 rendered, taking into account distance ordering of each pixel.
  • the system and in particular the depth analysis processor 5 and rendering processor 7 are configured to execute the methods according to one or both of the flowcharts according to Figs. 5 and 6.
  • Fig. 7 schematically shows an apparatus for receiving image information and for generating and displaying depth-enhanced images.
  • the apparatus comprises a receiving unit 9 for receiving broadcast images and additional information, a depth analysis processor 5 and a rendering processor 7.
  • the apparatus may be embodied as a separate device connected to the 3D display 8, or it may be incorporated, together with the 3D display 8, in a common housing. If no user interaction for annotating scenes is required, no dedicated input and output devices are provided.
  • Calibration information for at least one of the images. Calibration information includes at least camera position, focus and orientation information. It can include other parameters (distortion, etc.). This is not required for method C2 described in the Processing section below.
  • color model information that can be used to separate background from foreground in the image(s).
  • the color model includes different information for different foreground objects (e.g. players, referees, ball, etc.) to distinguish between different foreground objects as well.
  • the color model is, for example, determined with user interaction, e.g. by having a user assigning labels to blobs in a segmented image, e.g. by identifying particular blobs as being foreground objects or even distinguishing such objects as being part of a particular team. From this, the system learns the color distribution associated with this team. Color models may be either replaced or supplemented with other useful information for separation of foreground and background, such as shape, edge, or priors/templates. Furthermore, the color model can be learned automatically from the team's jersey colors (available from the clubs or the associations).
  • One or more video images showing the same [sports] scene at a different time e.g. one frame before or after the input image mentioned above.
  • Step 1 Determine a distance measure for each pixel of an image from a (given or virtual) camera showing the same scene.
  • the distance measure is not required to be a metric - the only requirement is that one is able to compare two measures and determine which one is smaller than the other, that is, which of two or more entities such as a pixel or blob or object etc., each being associated with a measure, lies closer to the camera.
  • Method A Use information from external device (laser scanner, object tracking device/method, e.g. by (differential) GPS or RF triangulation, etc.), for example.
  • Pixel-wise distance information for each input image pixel is directly available.
  • the device is not positioned at essentially the same location as the camera: reproject the scanner's 3D information into the camera space and optionally perform filtering to reduce noise in the distance measurements. In other words: transform the 3D information into 3D surfaces as seen by the camera, and project the camera image onto the 3D surfaces.
  • Such a stereo algorithm can use the color information in order to pre- segment the input image into foreground and background (playing field, stadium e.g.) pixels / parts (see Method C)
  • Another variant is to assume a default calibration (e.g. from previous images from the same camera, from arbitrary assumptions, or just from a standard default calibration representing a typical camera setup). Then a distance can be assigned based on the intersection of a ray originating from the center of projection of the camera through each object's lowest pixel with the field plane and assigning that distance to all pixels belonging to the object according to the separation/segmentation.
  • a default calibration e.g. from previous images from the same camera, from arbitrary assumptions, or just from a standard default calibration representing a typical camera setup.
  • a pixel-wise segmentation without requiring a color model, can be performed to get a classification into fore- and background ("Background Segmentation"). This can be done by subtracting the empty scene image, as projected according to the view seen by the camera providing the input image, from the input image. Alternatively, a statistical method can be used, assuming that, as seen over time and in different views, the color seen on a background surface that appears most often is the color of the background itself.
  • Assigning each object a (depth) label can be used to generate a "depth map" where each pixel of the same object has the same depth value. This guarantees a consistent depth over the entire object.
  • the annotation elements are inserted by user interaction. Typically, this is done by the user drawing, with a pointing device, on a view/image of the scene. This is explained in further detail below, under "other aspects"
  • the data can be transformed into the specific format required by the available 3D display.
  • a pointing device such as a pen or mouse or finger marks a pixel, which corresponds to a ray from the viewpoint through that pixel on the viewing plane. Therefore, it also corresponds to an infinite number of potential depth values. From a geometrical view, it is not obvious which depth value is "correct" or user- desired.
  • the pointing device position is interpreted as indicating a position on the ground, i.e. the 3D point chosen to correspond to the pointing device's position is the one where the ray from the viewpoint passes through the ground. This is like "painting" on ground.
  • 3D annotation objects are supposed to appear at a certain height over that ground position, there is the problem that the object does not appear at the location where user is interacting with the image.
  • Input depth along ray from viewpoint, or equivalent 3D position information, e.g. from an interaction as described above.
  • Pre-computed collision map or similar Calculate valid areas/volumes where no object is situated. This can be done in 3D, by intersecting volumes of scene elements and annotation elements, or in 2D, by intersecting areas, where the areas are defined by a vertical projection of the scene and annotation elements onto the playing field. This can be simplified by assuming scene elements (players) to have fixed shape such as an upright cylinder of fixed dimension. If user is inserting an annotation element at/through such an area/volume in which it intersects a scene element, the annotation element is automatically and dynamically readjusted, e.g. by bending the annotation element around the scene element. In situations where an annotation element has a variable shape, e.g. an arrow with fixed start and end points, and with other control points in between, and the user moves one of the control points, then the intersection detection, during movement of the control point, preferably is in operation and causes the line to snap to a trajectory where there is no intersection.
  • a variable shape e.g. an arrow with fixed start and end points, and
  • a distance/rank can be manually assigned to that scene element or object, which will cause that object to be rendered in front of or behind the annotation.

Abstract

A method for generating and processing depth-enhanced images comprises the steps of - providing an image of a scene (10); - computing a depth-enhanced representation of the scene (10) comprising information on the relative position of scene elements in (real or virtual) 3D space, wherein each scene element (23) corresponds to a particular object (12) in the scene and to an image segment generated by observing the object with the physical camera; -inputting one or more annotation elements (24) and information on the relative position of the one or more annotation elements (24) with regard to the scene elements (23); - defining camera parameters of a viewing camera; - rendering, by means of a rendering processor (7), a rendered image as seen by the viewing camera, wherein the one or more annotation (24) elements are shown in a spatially consistent relation to the scene elements (23).

Description

METHOD AND APPARATUS FOR
GENERATING AND PROCESSING DEPTH-ENHANCED IMAGES
FIELD OF THE INVENTION
The invention relates to the field of digital image processing, and in particular to a method as described in the preamble of the independent claims.
BACKGROU N D OF THE INVENTION
An notations Methods
It is known to annotate scenes of sports events on television by manually painting on a still image of a scene, or by automatically inserting markers (e.g. touchdown line) that appear to be part of the scene, with correct location and perspective.
Annotating sports scenes for example on television can be done either
1. By drawing over the image in 2D (Fig. la).
2. By drawing using chromakeying or similar segmentation methods to distinguish between fore- and background (Fig lb).
3. By drawing in a calibrated camera "on the field" in perspectively correct 2D, and possibly using chromakeying or similar segmentation methods to distinguish between fore- and background (Fig. lc).
4. By drawing over the scene in perspectively correct "3D" (e.g. commercial content) (Fig. Id).
Output device methods
Until today, such content has been shown on conventional 2D displays. Some content has been created for 3D displays, mainly based on either stereoscopic recording (with 2 cameras) of a real scene, or artificial rendering from 3D computer graphics models.
For such displays, different methods exist: Stereoscopic displays: two views are generated (from where the viewer's eyes are expected) and
o combined into one image (red/blue or red/green) [Analglyph]
o polarized orthogonally over each other [Polarized]
o shown after each other and viewed with so called shutter glasses where the glasses shutter the view in sync with the display and the shown image
(left/right) [Time-multiplexing]
Autostereoscopic displays:
o Parallax barriers/illumination: visual medium partitions display image into different views
o Lenticular sheets: cylindrical vertical lenses refract light from display into the different views
Computer generated holography
o reproduction of the holographic light interference pattern
Comparison:
Figure imgf000003_0001
(cross talk describes the effect of one eye seeing a portion of the other eye's view)
SPECIFIC PROBLEM
Annotations / virtual elements
A full 3D immersive perception of annotations is not given with the above methods. Especially, for example, in the case when it is possible to move a virtual camera around a scene and generate a view as seen by the virtual camera, the 3D effect is imperfect due to wrong occlusion effects. Consequently, elements of the scene and annotation elements are not displayed in a consistent spatial relationship. Ideally, the virtual elements are inserted into a scene such that it looks as if they are actually a part of the scene. See the following two examples (Fig. 2a and 2b) of the desired rendering of spatial relationships. Here, as in the other examples, the cylindrical objects represent objects (e.g. players in a sports event) of the real or virtual scene, and the arrows are examples for annotation elements. The scene or at least one image of the scene is assumed to be provided, and the annotations are to be inserted in to the scene such that they appear in a realistic fashion. The annotation elements (typically modeled as 3D computer graphics objects) are perceived by a user, by means of a 2D or 3D display device, as 3D objects or as surfaces located in 3D space and being part of the scene:
Fig. 2a: The arrow is in front of the middle object and goes around the left object (partly in front, partly behind).
Fig. 2b The arrow passes in between the objects, in front of the middle one and behind the other ones.
With color segmentation, such effects are not possible since such methods have no knowledge about an ordering or the distance from the camera. But as both examples show, such information is absolutely necessary since there is no way to distinguish between the middle object and the other objects on the basis of only the color information. A workaround for this problem, in the context of sports annotation, where the 3D objects are players on a flat playing field, is to render annotation elements either onto the playing field (that is, behind all the players and in front of the playing field) or in the air above the player's heads (that is, in front of all the players and the playing field). However, it is not possible to add annotation elements at a perceived height of, say, between 0 and 2 meters, as in the above example.
Even more, with LiberoVision's 3D replay technology (as described in patent application PCT/CH2007/000265, filed 24.05.2007, which is hereby incorporated in its entirety by reference), this effect becomes even more valuable. The realistic integration of truly 3D annotation elements into the scene that change their shape when viewed (maybe only visually, that is, when perceived from a different viewpoint, while remaining unchanged geometrically) while moving the virtual camera around the scene is desired.
3D Display output Regarding the output, novel 3D displays create the feeling of a three-dimensional picture by implying a notion of 3D/depth to the viewer. Such displays are especially suited for the above described content.
Several formats exist for the current generation of images or image sequences sent to that display. These usually require either an image of the scene with some kind of depth information, or two images (stereoscopic), or the image including depth information for objects and an "empty" background image (that is, without foreground objects) and depth, or a full 3D notion of the scene.
Thus, in order to bring a scene, typically recorded with an ordinary sensing device such as, e.g. a TV camera, and in particular a sports scene, to such a display, the availability or the creation of additional information is required. These can be partially obtained by using specific devices (e.g. depth scanner) or by analyzing and interpreting the image itself. The recorded transmitted image or image sequence, according to the current state of the art does not provide that required additional information.
One object of the invention is to generate such additional information for a 3D display, in particular for known 3D display types as described in the above.
SUMMARY OF THE INVENTION
Two main aspects of the invention are described in the accompanying claims, corresponding to at least two distinct and independently realizable embodiments according to the respective independent claims.
According to a first main aspect of the invention, a method for generating and processing depth-enhanced images is provided, comprising the steps of
• providing, by means of a physical camera or a storage device, an image of a scene, the image being a still image or an image in a sequence of video images;
• computing, by means of a depth analysis processor, from the image a depth-enhanced representation of the scene, the depth-enhanced representation comprising information on the relative position of scene elements in (real or virtual) 3D space, wherein each scene element corresponds to a particular object in the scene and to an image segment generated by observing the object with the physical camera; • inputting, by means of an input device, one or more annotation elements and information on the relative position of the one or more annotation elements with regard to the scene elements;
• defining camera parameters (position, orientation and lens settings) of at least one viewing camera, the viewing camera parameters being identical to or an approximation to those of the physical camera, or being parameters of a virtual camera;
• rendering, by means of a rendering processor, at least one rendered image as seen by the at least one viewing camera, wherein the one or more annotation elements are shown in a spatially consistent relation to the scene elements.
In a preferred embodiment of the invention, the at least one rendered image is displayed on a 2D or 3D-display. Thereby, in the at least one rendered image, the inserted annotation elements appear, according to their location in 3D space, to lie behind or in front of the scene elements.
In a preferred embodiment of the invention, the depth-enhanced representation of the scene comprises, for each pixel of the entire image or of part of the image, distance ordering information and wherein the relative position of scene elements and annotation elements is expressed by their distance ordering information. Preferably, the distance ordering information is one of a distance to the viewing camera and a relative ordering according to distance to the viewing camera. The relative ordering of distance indicates which of two arbitrarily selected pixels is associated with an object that is closer to the virtual camera.
In a preferred embodiment of the invention, the depth-enhanced representation of the scene comprises a mapping of segments of the image onto scene elements of a 3D representation of the scene and the relative position of scene elements and annotation elements is expressed by their location in the 3D representation of the scene.
In yet another preferred embodiment of the invention, the method comprising the steps of:
• by means of one or more further observation devices or the storage device, providing further information; • when computing the depth-enhanced representation of the scene by means of the depth analysis processor, taking into account, in addition to the image, this further information;
• wherein the one or more further observation devices comprise one or more of distance measuring scanners, further physical cameras, position determining systems.
In another preferred embodiment of the invention, two or more viewing cameras are defined, and an image is rendered for each of the viewing cameras, resulting in a pair of stereoscopic images or a group of multiscopic images; and optionally in a video sequence of pairs or groups of images.
The camera calibration parameters for the viewing camera can be set to be identical to the calibration parameters of an existing camera, in particular the one that provided the original image. Alternatively, the parameters of the viewing camera can be modified interactively by a user, generating a virtual view.
In a further preferred embodiment of the invention, the method, in the step of inputting one or more annotation elements, comprises the step of computing a 3D intersection of these annotation elements with scene elements, detecting an intersection and optionally indicating to a user that an intersection has been detected, and/or optionally also automatically correcting the shape or the position of the annotation element such that no intersection occurs, e.g. by stretching an annotation element to pass around a scene element in the virtual 3D space.
In a further preferred embodiment of the invention, the step of inputting information on the relative position of the one or more annotation elements comprises the step of interpreting the position inputted by the input device as indicating a position on a ground plane in the scene.
In a further preferred embodiment of the invention, the step of inputting information on the relative position of the one or more annotation elements comprises the step of interpreting the position inputted by the input device as indicating a position on a plane parallel to and at a given height above a ground plane in the scene. Preferably, when indicating the position, the height above the ground plane is controllable by means of an additional input device or input parameter such as a scroll wheel, keyboard keys, pen pressure.
According to a second main aspect of the invention, a method for generating and processing depth-enhanced images is provided, comprising the steps of
• providing, by means of a physical camera or a storage device or by receiving a broadcast, an image of a scene, the image being a still image or an image in a sequence of video images;
• computing, by means of a depth analysis processor, from the image a depth-enhanced representation of the scene, the depth-enhanced representation comprising information on the relative position of scene elements in (real or virtual) 3D space, wherein each scene element corresponds to a particular object in the scene and to an image segment generated by observing the object with the physical camera;
• rendering and displaying the depth-enhanced representation of the scene on a 3D output device.
In a further preferred embodiment of the invention, the step of rendering and displaying the depth-enhanced representation of the scene comprises the steps of
• defining camera calibration parameters (position, orientation and lens settings) of two viewing cameras, the viewing camera parameters defining a pair of (virtual) stereoscopic cameras;
• rendering, by means of a rendering processor, two rendered stereoscopic images as seen by the two viewing cameras;
• displaying the two rendered stereoscopic images by means of the 3D-display.
In a further preferred embodiment of the invention, the method comprises the step of inputting, by means of an input device, one or more annotation elements and information on the relative position of the one or more annotation elements with regard to the scene elements; and, in the step of rendering and displaying the depth-enhanced representation of the scene, comprises including the annotation elements.
In a further preferred embodiment of the invention, the image is received through a broadcast, the broadcast further comprising at least one of camera calibration information and a color model, and comprising the steps of performing the subsequent computation, rendering and displaying of the depth-enhanced representation by a receiver of the broadcast, based on the image and at least one of the calibration information and the color model.
An apparatus for generating and processing depth-enhanced images, comprises a depth analysis processor and a rendering processor configured to perform the method steps of the method according to one of the preceding claims.
In preferred embodiment of the invention, the apparatus comprises a receiving unit configured to receive a broadcast, the broadcast comprising at least one image and further comprising at least one of calibration information and a color model, and the depth analysis processor being configured to compute, from the image and at least one of the calibration information and the color model, a depth-enhanced representation of the scene, the depth- enhanced representation comprising information on the relative position of scene elements in 3D space.
A computer program for generating and processing depth-enhanced images is loadable into an internal memory of a digital computer, and comprises computer program code means to make, when said computer program code means is loaded in the computer, the computer execute the method according to the invention. In a preferred embodiment of the invention, a computer program product comprises a computer readable medium, having the computer program code means recorded thereon. The computer readable medium preferably is non-transitory, that is, tangible. In another preferred embodiment of the invention, the computer program is embodied or encoded as a reproducible computer- readable signal, and thus can be transmitted in the form of such a signal.
An important application of the invention lies in the field of processing, annotating and/or displaying still images and video images from sports events. In such situations, one or more of the following points are valid, and the information according to each point can, but must not necessarily, according to different embodiments of the invention, be used to segment an image and optionally also to provide distance information: • the action takes place on a flat playing field or track (summarily called "playing field"). Alternatively, a field may have a non-flat but known topography (as e.g. a golf course or dirt bike track).
• the color of the playing field (background) is known or a color model/color scheme can be deducted from images of it.
• the location of markers on the playing field (such as lines and their corners or intersections) is known, the markers having a different color than the playing field.
• scene elements (players or participants) move, most of the time, on the surface of the playing field. Some scene elements (balls) do not.
• scene elements can be distinguished from the playing field by colour (segmentation, chromakeying).
• scene elements can be classified by color (different teams).
• camera calibration parameters (typically the relative position and orientation of the real or virtual camera with respect to the scene, and optical parameters of the camera) are known or are automatically computed from an image of the playing field and from the information about the location of the markers.
Given some or all the above information, it is possible to create a 2.5D or a 3D model from a single image (see, e.g., PCT/CH2007/000265 and references cited therein), and shall only be summarized here: There exist established procedures to segment the image by color, separating players from the background. A player or a group of players is seen as a blob of non-background color. One or more image pixels at the lower side of the blob can be assumed to belong to a part of the player that stands on a point on the playing field. The 3D location of this point is found by intersecting the line of sight corresponding to these pixels with the playing field. The blob image is projected onto a vertical surface ("painted on a billboard") standing at the location of these pixels. According to PCT/CH 2007/000265 and references, a view as seen from a virtual camera location is generated by rendering the scene comprising the background field with the painted billboards standing on it. Other implementations can use, as scene elements, more detailed 3D surfaces carrying the blob images, that is, the blob images are projected onto the 3D surfaces. Given other sources information, e.g. depth information from distance scanners and/or images from additional cameras and/or from images slightly earlier or later in time, information according to one or more of the points listed above may not be required.
In the most general case, there are no assumptions on background surface shape and colour, and all positions and shapes of both the background and of moving scene elements are estimated from multiple images.
According to one aspect of the present invention, the information about the relative spatial location of the scene elements is used to insert annotation elements such that they appear, according to their 3D location, behind or in front of the scene elements.
According to another aspect of the present invention, the information about the relative spatial location of the scene elements is used to generate image data for driving a 3D TV display. As this spatial location information can be derived from the image with only little additional information (camera calibration and/or color model), an ordinary TV image stream can be enhanced to drive a 3D display, and the enhancement can take place at the TV receiver itself. There is only a very small additional load on communication.
Throughout the present application, the term "camera" is used for the sake of convenience. It is however to be understood, that the term may stand for any sensing device providing "image" information on a scene, where an image is a 2D or enhanced representation of the scene. In the simplest case, the image is a 2D camera image, in more sophisticated implementations of the invention it is a depth-enhanced, e.g. 2.5D or "pseudo 3D" representation including depth information obtained by a distance measurement device such as a laser scanner, or by a stereoscopic (or multiscopic) system using two or more cameras and providing "depth from stereo" information as a starting point for the implementation of the present invention.
Regarding output devices, when the term "3D-display" or "3D output device" is used, it is to be understood as representing any kind of display device that evokes, in a user, the perception of depth, be it on a screen or in 3D space.
The annotation elements are typically defined by a user by means of a graphical input device. Basic 3D shapes of annotation elements (Arrows, circles) can be predefined and only stretched and positioned by a user. Entire sets of annotation elements may also be predefined and retrieved from storage for manipulation or adjustment by a user or automatically. According to another embodiment of the invention, annotation elements are generated and/or positioned automatically, for example an "offside wall" computed from player positions, or trajectories of players across the field, determined by means of motion tracking of the players.
In more detail, the method for generating and processing depth-enhanced images comprises the following steps.
1. For each pixel of the (real or virtual) image of a scene, an ordering labeling or distance measure from the camera (real or virtual) is required (so called depth).
a. This can be retrieved by pixel calculations, for example, by calculating the distance from the (real or virtual) camera to the position of the element of which the pixel is part of.
b. or with a precise segmentation of the scene into different objects (object separation).
2. Optionally filter image/depth maps for smooth borders.
3. Either
a. For rendering annotations:
i. Rendering the virtual elements (graphical annotations, possibly represented by 3D models, typically defined and positioned by a user) from the same perspective(s) as the image. This step comprises
1. Either: Automatically providing a depth map for the annotation element, according to its position in the scene and a stencil (defining where the object lies in the image), too (stencils are commonly used in computer graphics, e.g. as incorporated in OpenGL).
2. Or: the annotation element can be labeled with an ordering label according to the user's wish, that is, the user defines only the relative placement of the annotation element with regard to the scene elements.
ii. Combining the two images (i.e. rendering the composite image): For each pixel, use the pixel from the images which is less distant (that is, has a lower ordering label) from the (real or virtual) camera by comparing the distance or the ordering information, respectively.
iii. Optionally, step (i) involves an interaction (a.k.a. telestration) to define the virtual element and its position/depth in the image.
iv. Optionally make sure that step (i) and (iii) in 3D avoids penetrating the objects (scene elements) with the virtual elements.
b. For displaying on a 3D display
i. Prepare the required information depending on the output device based on one or more of:
1. The input image(s)
2. The depth information associated with the image(s)
3. Optionally: An "empty background" image for one or all of the input images. Such an image can be created by removing the "foreground objects" (that is, scene elements such as players, or the ball).
4. Optionally: Depth information of the "empty background" image(s).
5. Optionally the additional step: Deriving warped or otherwise transformed images depicting the scene from a different viewpoint (for example for stereo- or multiscopic displays), based on the input image(s) and based on
a. Image color information
b. Detected blobs (corresponding to objects)
c. Depth map(s)
d. Calibration information
ii. Transfer the prepared information to the output device. Depending on the type of output device, the information of steps 1 and 2 is sufficient and is determined, or the information of steps 1 and 2 and 3 and 4, or of all steps.
c. Or a combination of (a) and (b), where in step (b) the combined depth map of both the input image(s) and the annotations is used. As a result, the invention allows for ...
• the combination of using the step of generating a depth map/order labeling with a rendering of artificial graphical elements to receive a realistic looking annotated scene
• solving the problem on how to add graphical 3D elements into a 2D (sports) scene in between the objects (players, ...) and not only below (under) or above (over). Thus, not only flat (below) or "in front of the foreground" elements are allowed but any 3D element can be placed into the scene.
• solving the problem of creating a 3D TV capable representation easily with only few requirements and capable of real-time transmission and conversion without hardly any loss of quality/resolution.
Examples for the application of the inventive method and data processing device for the insertion of annotations, for generating a 3D-display, and for a combination of both, are described in the following:
Annotations
• "3D offside wall": instead of just placing a line on the ground into the picture, a "wall" (vertical plane) can be added, showing on-side objects in the foreground and off-side objects in the background of the plane. With a semitransparent plane, this effect visualizes the situation even better.
• "Vertical elements": Considering for example Basketball, a graphical element depicting a jump shot consisting of a vertical or curved rising arrow or similar can be added into the scene and appears to be placed at the "correct" 3D location in between the objects.
• "Volume objects", "3D arrows", ... placed at heights above ground in the scene, where the spatial relationship with the scene elements has to be taken into account.
3D Display
• This solution can, for example, be used to broadcast/transmit only the picture, calibration information, and optionally a color model (for each frame or with regular updates or over another channel) to the receiver (TV) which generates the 3D picture itself (instead of transmitting picture and depth or also the "background" and its depth - resulting in either a lower resolution or higher bandwidth requirements). The calibration information specifies the relative position and orientation of the real or virtual camera (i.e. the viewpoint from which the picture is taken) with respect to the scene, and optical parameters of the camera specifying the mapping of the scene onto the camera picture. The color model typically specifies several sets of one or more color ranges, wherein each set of one or more color ranges defines, e.g., the playing field, other background elements, players of one team, players of the other team, etc. Based on this information, the receiver is able to segment the 2D image, model the 3D relation of the playing field with respect to the camera, and determine the location of the players (scene elements) on the playing field. This gives the information required to display the scene on the 3D-display.
o The calibration and color model information requires (compared to the image(s) itself only very little data and can, thus, be included in the image without any loss of quality,
o The processing and generation of the required data for the 3D display can be easily integrated in the end-user device.
Combination
• Taking advantage of the added value of a 3D TV by using three dimensional annotation elements increases the feeling of immersion and depth.
Further preferred embodiments are evident from the dependent patent claims. Features of the method claims may be combined with features of the device claims and vice versa.
DEFINITION OF TERMS
In the general as well as in the detailed description of the invention, the following terms are used: scene: a collection of 3D objects in a 3D environment. The scene may be from the real world, or may be modeled by means of a computer ("virtual scene"). The 3D objects that constitute the scene are also called "scene elements" view: a 2D or 2.SD or 3D representation of a scene, as seen from a particular viewpoint in 3D space. Again, the view may be generated by a real camera in the real world, or by a virtual camera in a virtual scene. The term "view" expresses the fact that there is a 3D context from which the view is derived. Once a view is generated, and in the context of a display, it may also be considered to be an image.
Note: in some places in this text, the term "scene" is used in lieu of "view", since even in the case when a view is manipulated and an annotation is inserted into the view, (without a complete 3D model of the scene existing), conceptually the annotation is considered to be inserted into the scene. Thus, one may talk about "inserting an annotation into a scene", although the computer graphic manipulations operate on a view, or on a view enhanced with depth ordering information (also called 2.5D view, that is, two-and-a-half-dimensional view, or pseudo-3D). different types of cameras.
• A "real camera" is a physical camera capturing images from the real world. The parameters defining the pose (position and orientation) and optical characteristics (focal length, field of view, zoom factor etc.) of the camera are called calibration parameters, since they represent real variables in the real world that usually are determined by a calibration process. A "hypothetical camera" is related to a real camera whose calibration parameters are not known exactly, or not known at all. In such a case the image from the corresponding real camera is processed as if the calibration parameters of the real camera were those of the hypothetical camera. A "virtual camera" is a conceptual entity, defined by a set of camera parameters, which are of the same type as the above calibration parameters, but are predetermined or computed, rather than being calibrated to match the real world. Based on these camera parameters, the virtual camera is used to render images from elements in a 3D scene. The term "rendering" is commonly used in 3D computer graphics to denote the computation of a 2D image from a 3D scene. In the context of a 3D display device, one may also say that a 3D scene is rendered on the display device, be it via 2D images or by a rendering process that does not require 2D images.
• A "viewing camera" may be a virtual camera, used to define a computer generated view, or a real camera, whose captured image of the real scene is enhanced by the virtual annotation elements and/or by creating the depth-enhanced representation of the scene. The annotation elements are computed and rendered from the point of view of a virtual camera having parameters corresponding to real camera, in order to insert the annotation elements correctly (with regard to perspective and visibility) into the depth-enhanced representation. annotation: a graphic element, which conceptually is a 3D object located in 3D space, and which is inserted into an existing scene. Usually this insertion, i.e. the act of annotating, is initiated and/or controlled by a human user.
Note: In some places of this document, annotation elements may be called simply "virtual elements", as opposed to elements of the scene that represent real world objects such as players in a sports scene, and incorporate image and/or position information from these real world objects. Scene elements representing the real world objects may be inserted, according to their real position, and/or with their associated real image data into a 3D model and then rendered again for viewing on a display. Such scene elements shall also be considered "real". In a more general context, the scene may also be generated in a purely virtual manner, that is, without image information from a real, static or dynamically unfolding recorded or live event. component: an element of an image, depending on the manner in which the image is represented. Typically, a component corresponds to a pixel. connected component: a number of components that are considered together as a unit. In the context of an image, such a unit may be called a blob. In the context of the scene, such a unit often corresponds to an object.
BRIEF DESCRIPTION OF THE DRAWINGS
The subject matter of the invention will be explained in more detail in the following text with reference to preferred exemplary embodiments which are illustrated in the attached drawings, in which is shown in a schematic manner:
Fig la-d: known graphic annotations;
Fig 2a-b: issues involved in 3D graphic annotations;
Fig. 3: a system for recording, processing and displaying depth-enhanced images; Fig. 4: different stages in image processing;
Fig. 5: a flow diagram of a method for enhancing images by creating a 3D-enhanced representation incorporating annotations;
Fig. 6: a flow diagram of a method for enhancing images by creating a 3D-enhanced representation for displaying it or rendering it on a 3D display device; and
Fig. 7: an apparatus for receiving image information and for generating and displaying depth-enhanced images.
The reference symbols used in the drawings, and their meanings, are listed in summary form in the list of reference symbols. In principle, identical elements are provided with the same reference symbols in the figures.
PREFERRED EMBODIMENTS OF THE INVENTION
Fig. 3 shows, schematically, a system for recording, processing and displaying depth- enhanced images. A physical camera 1 and further observation devices, are arranged to observe a scene 10 comprising a background 11 and objects 12. The further observation devices may be further physical camera(s) 2a, position determining system(s) 2b, distance scanner(s) 2b. A virtual camera 3 is characterized by calibration data (position, orientation, optical parameters), and the system is configured to generate a view of the scene 10 as it would be seen by the virtual camera 3. The system further comprises a computer-readable storage means 4, a data processing unit programmed to operate as a depth analysis processor 5, one or more user interface devices such as a display 6a and an input device 6b (such as pointing device, keyboard, and the like), a data processing unit programmed to operate as a rendering processor 7, and a 2D or 3D display 8.
Fig. 4 shows, schematically, different stages in image processing as performed by the system: a) original image with stadium background 21, playing field background 22 and players 23 (in a highly schematic representation). b) images segmented into background and different scene elements. Each player is represented by an individual segment, i.e. a blob of pixels. Overlapping players (not shown) may be represented by just one blob comprising the pixels corresponding to both players. c) annotation element 24 in a desired 2D position in the image b) scene and annotation element 24 rendered, taking into account distance ordering of each pixel.
The system, and in particular the depth analysis processor 5 and rendering processor 7 are configured to execute the methods according to one or both of the flowcharts according to Figs. 5 and 6.
Fig. 7 schematically shows an apparatus for receiving image information and for generating and displaying depth-enhanced images. The apparatus comprises a receiving unit 9 for receiving broadcast images and additional information, a depth analysis processor 5 and a rendering processor 7. The apparatus may be embodied as a separate device connected to the 3D display 8, or it may be incorporated, together with the 3D display 8, in a common housing. If no user interaction for annotating scenes is required, no dedicated input and output devices are provided.
The individual steps involved in the preferred embodiments of the invention are explained in the following. This is done by describing different variants for the stages of
• inputting different types of input data used for the image processing,
• processing the input data, and
• outputting the enhanced image data.
Depending on the type of available input data and on user/operator requirements, different methods for processing the input data can be chosen.
INPUT
• One or more (video) images showing a scene, in particular a sports scene. If more than one image is available and to be used, then all from the same instant in time (synchronized). Typically, these are video images/frames/fields from TV cameras. The images shall be called "input images" or simply "images", where context allows. • Optional: Calibration information for at least one of the images. Calibration information includes at least camera position, focus and orientation information. It can include other parameters (distortion, etc.). This is not required for method C2 described in the Processing section below.
• Optional: color model information that can be used to separate background from foreground in the image(s). Optionally, the color model includes different information for different foreground objects (e.g. players, referees, ball, etc.) to distinguish between different foreground objects as well. The color model is, for example, determined with user interaction, e.g. by having a user assigning labels to blobs in a segmented image, e.g. by identifying particular blobs as being foreground objects or even distinguishing such objects as being part of a particular team. From this, the system learns the color distribution associated with this team. Color models may be either replaced or supplemented with other useful information for separation of foreground and background, such as shape, edge, or priors/templates. Furthermore, the color model can be learned automatically from the team's jersey colors (available from the clubs or the associations).
• Optional: One or more video images showing the same [sports] scene at a different time, e.g. one frame before or after the input image mentioned above.
• Optional: "empty" background information of the scene/image, i.e. showing no "foreground" objects/players...
PROCESSING
Step 1: Determine a distance measure for each pixel of an image from a (given or virtual) camera showing the same scene. The distance measure is not required to be a metric - the only requirement is that one is able to compare two measures and determine which one is smaller than the other, that is, which of two or more entities such as a pixel or blob or object etc., each being associated with a measure, lies closer to the camera.
If distance information is required only for a given real camera (that is, a camera providing the input image that is to be processed), the following methods A-E can be applied alone or in combination: • Method A (external information based): Use information from external device (laser scanner, object tracking device/method, e.g. by (differential) GPS or RF triangulation, etc.), for example.
• Object positions: Determine the object's location in the image and perform a blob detection including that knowledge (This is similar to method C but with additional distance or location information for refining and guiding the segmentation process which extracts the blobs).
• Scanning device:
o If the device is positioned at essentially the same location as the camera:
Pixel-wise distance information for each input image pixel is directly available.
o If the device is not positioned at essentially the same location as the camera: reproject the scanner's 3D information into the camera space and optionally perform filtering to reduce noise in the distance measurements. In other words: transform the 3D information into 3D surfaces as seen by the camera, and project the camera image onto the 3D surfaces.
• Method B (stereo based): If more than one input image is available, then
• No calibration information is required for the "second" camera, but it can be helpful
• If no 2nd calibration is available:
o Use a stereo algorithm to determine distance information for each input image pixel (or for those pixels or image areas only for which the algorithm gives a result). "Depth from stereo" algorithms are commonly known.
o Such a stereo algorithm can use the color information in order to pre- segment the input image into foreground and background (playing field, stadium e.g.) pixels / parts (see Method C)
• If 2nd calibration information is available, too:
o Use knowledge of calibration to put constraints on the stereo algorithm improving its quality o Optionally pre-segment the image(s) (see Method C) to gain prior knowledge for even a better quality: a blob in image A must correspond to a blob in image B, consequently intersection results of the corresponding rays restrict the possible depth values.
• Method C (blob detection): If only one input image and the color model information is available:
• Pixel-wise classification into background and (differing) foreground pixels, wherein the foreground pixels may be classified into different classes, and clustering these pixels into connected components or blobs, corresponding to objects.
• Optionally using manual interaction to distinguish between different objects (object detection, and/or pixel-wise classification)
• Optionally using (semi) automatic object recognition algorithms. These may be based on shape, colour, etc.
• Determining 3D position of the connected components by assuming that these connected components represent objects on the ground plane (field) and intersecting a ray through the connected component or pixel(s) at the bottom of the connected component with the ground plane (or a plane parallel to the ground plane).
• Calculating the distance from the camera to the calculated 3D position of the component and assign it to all pixels of the connected component.
• Method C2 (order labeling): If only one input image is given, without calibration, but the color model information is available:
• Classification and separation of input image based on the color model information into multiple objects, as described in the pending patent application PCT/CH2007/000265.
• Using manual interaction to define a layering of the objects describing its rank/distance from the camera. For example, the user clicks on one object (i.e., clicks one of its pixels according to the separation or segmentation) and defines a rank; then the user hits another object and defines another rank, where a lower rank indicates that the object is closer to the camera (front) as objects with a higher rank.
• Another variant is to assume a default calibration (e.g. from previous images from the same camera, from arbitrary assumptions, or just from a standard default calibration representing a typical camera setup). Then a distance can be assigned based on the intersection of a ray originating from the center of projection of the camera through each object's lowest pixel with the field plane and assigning that distance to all pixels belonging to the object according to the separation/segmentation.
• Method C3 (background): If an empty scene background image is available
• a pixel-wise segmentation, without requiring a color model, can be performed to get a classification into fore- and background ("Background Segmentation"). This can be done by subtracting the empty scene image, as projected according to the view seen by the camera providing the input image, from the input image. Alternatively, a statistical method can be used, assuming that, as seen over time and in different views, the color seen on a background surface that appears most often is the color of the background itself.
• Continue as in C2.
• Method D (temporal information): If no color model is given but images from different time steps are available:
• Use visual flow algorithm (in combination with calibration information) to distinguish between foreground pixels/objects and background parts.
• Determining an estimated position of such connected components by assuming that these components represent objects on the ground plane (playing field, constituting the background) and intersecting a ray through the component with the ground plane. Alternatively, the ray may be intersected with a plane parallel to the ground plane. This is done under the assumption that certain features, such as the center of gravity of a player and of the connected components, i.e. the blob, corresponding to the player, lie at a certain average height. Intersecting the ray through this center of gravity with a plane at said average height returns the estimated position of the player.
• Calculating distance from camera to the calculated position of the component and assign it to all pixels of the component.
• Method E (multi view): If more than one input image is available including color information
• Apply Method C and use multi-camera information to match components from different views to each other. Then do not intersect rays with the plane but calculate (in 3D) the point of smallest distance between the rays of corresponding parts of the matched components. Use that point as position of the component. This allows to locate objects in 3D space correctly, e.g. a flying ball, or players jumping into the air.
• Calculate distance from camera to the calculated position of the component and assign it to all pixels of the component.
• Depending in circumstances, various combinations of the above methods are possible.
If the distance information is required for a virtual camera (as described in patent application PCT/CH2007/000265):
• Rendering the scene from one or more virtual viewpoints requires position information of the objects in the scene for a proper parallax effect.
• Rendering the scene from the virtual viewpoint, e.g. with OpenGL, automatically yields a depth map of that scene. This may result in different distances for different parts of an object.
• Assigning each object a (depth) label can be used to generate a "depth map" where each pixel of the same object has the same depth value. This guarantees a consistent depth over the entire object.
• Such a method to determine the position of the objects is described in the pending patent application PCT/CH 2007/000265. If a sequence of images of the same camera is available, one can use known temporal coherence methods (e.g. filtering) to improve the quality over time.
Once the distance measure (thus, the real distance, or the relative ranking) for the scene elements is known, the annotation elements are inserted by user interaction. Typically, this is done by the user drawing, with a pointing device, on a view/image of the scene. This is explained in further detail below, under "other aspects"
OUTPUT
On a 2D display (on ly for an notations) Just render the scene in a picture and display it.
On a 3D display (optionally with an notations)
Once the picture and depth/rank information is available, the data can be transformed into the specific format required by the available 3D display.
OTHER ASPECTS
• How is the "interaction" (or telestration) to define the virtual element and its position/depth in the image done (see 3.a.iii above)? In a preferred embodiment of the invention, the following steps are implemented in this interaction:
• Input: a pointing device such as a pen or mouse or finger marks a pixel, which corresponds to a ray from the viewpoint through that pixel on the viewing plane. Therefore, it also corresponds to an infinite number of potential depth values. From a geometrical view, it is not obvious which depth value is "correct" or user- desired.
• Problem: How do you get the desired depth as an interaction metaphor?
• Possible solutions:
o the pointing device position is interpreted as indicating a position on the ground, i.e. the 3D point chosen to correspond to the pointing device's position is the one where the ray from the viewpoint passes through the ground. This is like "painting" on ground. However, if 3D annotation objects are supposed to appear at a certain height over that ground position, there is the problem that the object does not appear at the location where user is interacting with the image.
o "Painting" on a virtual plane parallel to the ground plane but at a specific height (e.g. lm), where most of the drawing objects will appear. Then, the 3D annotation object appears exactly where the interaction takes place. However, all objects are on the same height over ground.
o "Painting" on a virtual plane parallel to the ground plane but with different heights for different tools, or by controlling the height of the annotation element above ground by means of an additional input device, such as a scroll wheel or specific keyboard keys (up/down).
• Preventing annotation elements from intersecting scene objects.
• Input: depth along ray from viewpoint, or equivalent 3D position information, e.g. from an interaction as described above.
• Problem: How do you make sure that the annotation element, such as an arrow, does not go (in 3D) through the scene object but only around it?
• Possible solutions:
o Do not care
o Pre-computed collision map or similar: Calculate valid areas/volumes where no object is situated. This can be done in 3D, by intersecting volumes of scene elements and annotation elements, or in 2D, by intersecting areas, where the areas are defined by a vertical projection of the scene and annotation elements onto the playing field. This can be simplified by assuming scene elements (players) to have fixed shape such as an upright cylinder of fixed dimension. If user is inserting an annotation element at/through such an area/volume in which it intersects a scene element, the annotation element is automatically and dynamically readjusted, e.g. by bending the annotation element around the scene element. In situations where an annotation element has a variable shape, e.g. an arrow with fixed start and end points, and with other control points in between, and the user moves one of the control points, then the intersection detection, during movement of the control point, preferably is in operation and causes the line to snap to a trajectory where there is no intersection.
• Enforcing a particular spatial relationship between annotation element and scene element:
• Problem: in certain situations, it may be desirable to explicitly force an annotation element to be perceived as being located in front or behind a particular scene element or object.
• Possible solution:
o A distance/rank can be manually assigned to that scene element or object, which will cause that object to be rendered in front of or behind the annotation.
o For "front" objects: The object can be rendered a second time, without considering the distance/rank. With that, the object will appear "in front" of the annotation.

Claims

1. Method for generating and processing depth-enhanced images, comprising the steps of
• providing, by means of a physical camera (1, 2a) or a storage device (4), an image of a scene (10), the image being a still image or an image in a sequence of video images;
• computing, by means of a depth analysis processor, from the image a depth-enhanced representation of the scene (10), the depth-enhanced representation comprising information on the relative position of scene elements in (real or virtual) 3D space, wherein each scene element (23) corresponds to a particular object (12) in the scene and to an image segment generated by observing the object with the physical camera;
• inputting, by means of an input device (6b), one or more annotation elements (24) and information on the relative position of the one or more annotation elements (24) with regard to the scene elements (23);
• defining camera parameters (position, orientation and lens settings) of at least one viewing camera, the viewing camera parameters being identical to or an approximation to those of the physical camera (1, 2a), or being parameters of a virtual camera (3);
• rendering, by means of a rendering processor (7), at least one rendered image as seen by the at least one viewing camera, wherein the one or more annotation (24) elements are shown in a spatially consistent relation to the scene elements (23).
2. The method of claim 1, comprising the further step of displaying the at least one rendered image on a 2D or 3D-display (8).
3. The method of one of the preceding claims, wherein in the at least one rendered image the inserted annotation elements (24) appear, according to their location in 3D space, to lie behind or in front of the scene elements (23).
4. The method of one of the preceding claims, wherein the depth-enhanced representation of the scene (10) comprises, for each pixel of the entire image or of part of the image, distance ordering information and wherein the relative position of scene elements (23) and annotation elements (24) is expressed by their distance ordering information.
5. The method of claim 4, wherein the distance ordering information is one of a distance to the viewing camera and a relative ordering according to distance to the viewing camera.
6. The method of one of claims 1-5, wherein the depth-enhanced representation of the scene comprises a mapping of segments of the image onto scene elements of a 3D representation of the scene (10) and wherein the relative position of scene elements (23) and annotation elements (24) is expressed by their location in the 3D representation of the scene (10).
7. The method of one of the preceding claims, comprising the steps of
• by means of one or more further observation devices (2a, 2b, 2c) or the storage device (4), providing further information;
• when computing the depth-enhanced representation of the scene by means of the depth analysis processor (5), taking into account, in addition to the image, this further information;
wherein the one or more further observation devices (2a, 2b, 2c) comprise one or more of distance measuring scanners (2b), further physical cameras (2a), position determining systems (2c).
8. The method of one of the preceding claims, wherein two or more viewing cameras are defined, and an image is rendered for each of the viewing cameras, resulting in a pair of stereoscopic images or a group of multiscopic images; and optionally in a video sequence of pairs or groups of images.
9. The method of one of the preceding claims, wherein in the step of inputting one or more annotation elements (24), comprises the step of computing a 3D intersection of these annotation elements (24) with scene elements (23), detecting an intersection and optionally indicating to a user that an intersection has been detected, and/or optionally also automatically correcting the shape or the position of the annotation element (24) such that no intersection occurs, e.g. by stretching an annotation element (24) to pass around a scene element (23) in the virtual 3D space.
10. The method of one of the preceding claims, wherein the step of inputting information on the relative position of the one or more annotation elements (24) comprises the step of interpreting the position inputted by the input device (6b) as indicating a position on a ground plane (11) in the scene (10).
11. The method of one of the preceding claims, wherein the step of inputting information on the relative position of the one or more annotation elements (24) comprises the step of interpreting the position inputted by the input device (6b) as indicating a position on a plane parallel to and at a given height above a ground plane ( 11) in the scene (10).
12. The method of claim 11, wherein, when indicating the position, the height above the ground plane (11) is controllable by means of an additional input device or input parameter.
13. A method for generating and processing depth-enhanced images, comprising the steps of
• providing, by means of a physical camera (1, 2a) or a storage device (4) or by receiving a broadcast, an image of a scene (10), the image being a still image or an image in a sequence of video images;
• computing, by means of a depth analysis processor (5), from the image a depth- enhanced representation of the scene (10), the depth-enhanced representation comprising information on the relative position of scene elements (23) in (real or virtual) 3D space, wherein each scene element (23) corresponds to a particular object (12) in the scene (10) and to an image segment generated by observing the object with the physical camera;
• rendering and displaying the depth-enhanced representation of the scene (10) on a 3D output device (8).
14. The method of claim 13, wherein the step of rendering and displaying the depth- enhanced representation of the scene comprises the steps of
• defining camera calibration parameters (position, orientation and lens settings) of two viewing cameras, the viewing camera parameters defining a pair of (virtual) stereoscopic cameras;
• rendering, by means of a rendering processor (7), two rendered stereoscopic images as seen by the two viewing cameras;
• displaying the two rendered stereoscopic images by means of the 3D-display.
15. The method of claim 13 or 14, comprising the step of inputting, by means of an input device (6b), one or more annotation elements (24) and information on the relative position of the one or more annotation elements (24) with regard to the scene elements (23); and, in the step of rendering and displaying the depth-enhanced representation of the scene (10), including the annotation elements.
16. The method of one of claims 13 to 15, wherein the image is received through a broadcast, the broadcast further comprising at least one of camera calibration information and a color model, and comprising the steps of performing the subsequent computation, rendering and displaying of the depth-enhanced representation by a receiving unit (9) of the broadcast, based on the image and at least one of the calibration information and the color model.
17. An apparatus for generating and processing depth-enhanced images, comprising a depth analysis processor (5) and a rendering processor (7) configured to perform the method steps of the method according to one of the preceding claims.
18. The apparatus of claim 17, comprising a receiving unit (9) configured to receive a broadcast, the broadcast comprising at least one image and further comprising at least one of calibration information and a color model, and the depth analysis processor (5) being configured to compute, from the image and at least one of the calibration information and the color model, a depth-enhanced representation of the scene, the depth-enhanced representation comprising information on the relative position of scene elements (23) in 3D space.
19. A non-transitory computer readable medium comprising computer readable program code encoding a computer program that, when loaded and executed on a computer, causes the computer to perform the method according to one of method steps 1 through 16.
20. A reproducible computer-readable signal encoding the computer program that, when loaded and executed on a computer, causes the computer to perform the method according to one of method steps 1 through 16.
PCT/CH2010/000218 2009-09-10 2010-09-07 Method and apparatus for generating and processing depth-enhanced images WO2011029209A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US24105209P 2009-09-10 2009-09-10
US61/241,052 2009-09-10

Publications (2)

Publication Number Publication Date
WO2011029209A2 true WO2011029209A2 (en) 2011-03-17
WO2011029209A3 WO2011029209A3 (en) 2011-09-29

Family

ID=43732858

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CH2010/000218 WO2011029209A2 (en) 2009-09-10 2010-09-07 Method and apparatus for generating and processing depth-enhanced images

Country Status (1)

Country Link
WO (1) WO2011029209A2 (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139232A1 (en) * 2011-04-11 2012-10-18 Liberovision Ag Image processing
US20130342526A1 (en) * 2012-06-26 2013-12-26 Yi-Ren Ng Depth-assigned content for depth-enhanced pictures
WO2014085735A1 (en) * 2012-11-30 2014-06-05 Legend3D, Inc. Three-dimensional annotation system and method
US8897596B1 (en) 2001-05-04 2014-11-25 Legend3D, Inc. System and method for rapid image sequence depth enhancement with translucent elements
WO2015005826A1 (en) * 2013-07-09 2015-01-15 Sobolev Sergey Aleksandrovich Method for transmitting and receiving stereo information about a viewed space
US8953905B2 (en) 2001-05-04 2015-02-10 Legend3D, Inc. Rapid workflow system and method for image sequence depth enhancement
US9007404B2 (en) 2013-03-15 2015-04-14 Legend3D, Inc. Tilt-based look around effect image enhancement method
US9007365B2 (en) 2012-11-27 2015-04-14 Legend3D, Inc. Line depth augmentation system and method for conversion of 2D images to 3D images
US9241147B2 (en) 2013-05-01 2016-01-19 Legend3D, Inc. External depth map transformation method for conversion of two-dimensional images to stereoscopic images
US9282321B2 (en) 2011-02-17 2016-03-08 Legend3D, Inc. 3D model multi-reviewer system
US9288476B2 (en) 2011-02-17 2016-03-15 Legend3D, Inc. System and method for real-time depth modification of stereo images of a virtual reality environment
US9286941B2 (en) 2001-05-04 2016-03-15 Legend3D, Inc. Image sequence enhancement and motion picture project management system
US9407904B2 (en) 2013-05-01 2016-08-02 Legend3D, Inc. Method for creating 3D virtual reality from 2D images
US9438878B2 (en) 2013-05-01 2016-09-06 Legend3D, Inc. Method of converting 2D video to 3D video using 3D object models
EP3121792A4 (en) * 2014-03-17 2017-03-15 Panasonic Intellectual Property Management Co., Ltd. Processing device for label information for multi-viewpoint images and processing method for label information
US9609307B1 (en) 2015-09-17 2017-03-28 Legend3D, Inc. Method of converting 2D video to 3D video using machine learning
US9639945B2 (en) 2015-08-27 2017-05-02 Lytro, Inc. Depth-based application of image effects
WO2017191978A1 (en) * 2016-05-02 2017-11-09 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
US9858649B2 (en) 2015-09-30 2018-01-02 Lytro, Inc. Depth-based image blurring
US10129524B2 (en) 2012-06-26 2018-11-13 Google Llc Depth-assigned content for depth-enhanced virtual reality images
US10205896B2 (en) 2015-07-24 2019-02-12 Google Llc Automatic lens flare detection and correction for light-field images
US10275898B1 (en) 2015-04-15 2019-04-30 Google Llc Wedge-based light-field video capture
US10275892B2 (en) 2016-06-09 2019-04-30 Google Llc Multi-view scene segmentation and propagation
US10298834B2 (en) 2006-12-01 2019-05-21 Google Llc Video refocusing
US10334151B2 (en) 2013-04-22 2019-06-25 Google Llc Phase detection autofocus using subaperture images
US10341632B2 (en) 2015-04-15 2019-07-02 Google Llc. Spatial random access enabled video system with a three-dimensional viewing volume
US10354399B2 (en) 2017-05-25 2019-07-16 Google Llc Multi-view back-projection to a light-field
US10412373B2 (en) 2015-04-15 2019-09-10 Google Llc Image capture for virtual reality displays
US10419737B2 (en) 2015-04-15 2019-09-17 Google Llc Data structures and delivery methods for expediting virtual reality playback
US10440407B2 (en) 2017-05-09 2019-10-08 Google Llc Adaptive control for immersive experience delivery
US10444931B2 (en) 2017-05-09 2019-10-15 Google Llc Vantage generation and interactive playback
US10469873B2 (en) 2015-04-15 2019-11-05 Google Llc Encoding and decoding virtual reality video
US10474227B2 (en) 2017-05-09 2019-11-12 Google Llc Generation of virtual reality with 6 degrees of freedom from limited viewer data
CN110719532A (en) * 2018-02-23 2020-01-21 索尼互动娱乐欧洲有限公司 Apparatus and method for mapping virtual environment
US10540818B2 (en) 2015-04-15 2020-01-21 Google Llc Stereo image generation and interactive playback
US10545215B2 (en) 2017-09-13 2020-01-28 Google Llc 4D camera tracking and optical stabilization
US10546424B2 (en) 2015-04-15 2020-01-28 Google Llc Layered content delivery for virtual and augmented reality experiences
US10567464B2 (en) 2015-04-15 2020-02-18 Google Llc Video compression with adaptive view-dependent lighting removal
US10565734B2 (en) 2015-04-15 2020-02-18 Google Llc Video capture, processing, calibration, computational fiber artifact removal, and light-field pipeline
US10594945B2 (en) 2017-04-03 2020-03-17 Google Llc Generating dolly zoom effect using light field image data
US10679361B2 (en) 2016-12-05 2020-06-09 Google Llc Multi-view rotoscope contour propagation
CN111275611A (en) * 2020-01-13 2020-06-12 深圳市华橙数字科技有限公司 Method, device, terminal and storage medium for determining depth of object in three-dimensional scene
US10965862B2 (en) 2018-01-18 2021-03-30 Google Llc Multi-camera navigation interface
CN113223563A (en) * 2018-09-29 2021-08-06 苹果公司 Device, method and graphical user interface for depth-based annotation
US11328446B2 (en) 2015-04-15 2022-05-10 Google Llc Combining light-field data with active depth data for depth map generation
US11727650B2 (en) 2020-03-17 2023-08-15 Apple Inc. Systems, methods, and graphical user interfaces for displaying and manipulating virtual objects in augmented reality environments
US11797146B2 (en) 2020-02-03 2023-10-24 Apple Inc. Systems, methods, and graphical user interfaces for annotating, measuring, and modeling environments
US11808562B2 (en) 2018-05-07 2023-11-07 Apple Inc. Devices and methods for measuring using augmented reality
US11941764B2 (en) 2021-04-18 2024-03-26 Apple Inc. Systems, methods, and graphical user interfaces for adding effects in augmented reality environments

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115028A (en) * 1996-08-22 2000-09-05 Silicon Graphics, Inc. Three dimensional input system using tilt
WO1999026198A2 (en) * 1997-11-14 1999-05-27 National University Of Singapore System and method for merging objects into an image sequence without prior knowledge of the scene in the image sequence

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286941B2 (en) 2001-05-04 2016-03-15 Legend3D, Inc. Image sequence enhancement and motion picture project management system
US8897596B1 (en) 2001-05-04 2014-11-25 Legend3D, Inc. System and method for rapid image sequence depth enhancement with translucent elements
US8953905B2 (en) 2001-05-04 2015-02-10 Legend3D, Inc. Rapid workflow system and method for image sequence depth enhancement
US10298834B2 (en) 2006-12-01 2019-05-21 Google Llc Video refocusing
US9288476B2 (en) 2011-02-17 2016-03-15 Legend3D, Inc. System and method for real-time depth modification of stereo images of a virtual reality environment
US9282321B2 (en) 2011-02-17 2016-03-08 Legend3D, Inc. 3D model multi-reviewer system
WO2012139232A1 (en) * 2011-04-11 2012-10-18 Liberovision Ag Image processing
US10552947B2 (en) 2012-06-26 2020-02-04 Google Llc Depth-based image blurring
US10129524B2 (en) 2012-06-26 2018-11-13 Google Llc Depth-assigned content for depth-enhanced virtual reality images
US20130342526A1 (en) * 2012-06-26 2013-12-26 Yi-Ren Ng Depth-assigned content for depth-enhanced pictures
US9607424B2 (en) * 2012-06-26 2017-03-28 Lytro, Inc. Depth-assigned content for depth-enhanced pictures
US9007365B2 (en) 2012-11-27 2015-04-14 Legend3D, Inc. Line depth augmentation system and method for conversion of 2D images to 3D images
US9547937B2 (en) 2012-11-30 2017-01-17 Legend3D, Inc. Three-dimensional annotation system and method
WO2014085735A1 (en) * 2012-11-30 2014-06-05 Legend3D, Inc. Three-dimensional annotation system and method
US9007404B2 (en) 2013-03-15 2015-04-14 Legend3D, Inc. Tilt-based look around effect image enhancement method
US10334151B2 (en) 2013-04-22 2019-06-25 Google Llc Phase detection autofocus using subaperture images
US9407904B2 (en) 2013-05-01 2016-08-02 Legend3D, Inc. Method for creating 3D virtual reality from 2D images
US9438878B2 (en) 2013-05-01 2016-09-06 Legend3D, Inc. Method of converting 2D video to 3D video using 3D object models
US9241147B2 (en) 2013-05-01 2016-01-19 Legend3D, Inc. External depth map transformation method for conversion of two-dimensional images to stereoscopic images
WO2015005826A1 (en) * 2013-07-09 2015-01-15 Sobolev Sergey Aleksandrovich Method for transmitting and receiving stereo information about a viewed space
RU2543549C2 (en) * 2013-07-09 2015-03-10 Сергей Александрович Соболев Television multiview method of acquiring, transmitting and receiving stereo information on monitored space with automatic measurement thereof "third eye" system
US10218959B2 (en) 2013-07-09 2019-02-26 Limited Liability Company “3D Tv Technics” Method for transmitting and receiving stereo information about a viewed space
CN105556572A (en) * 2013-07-09 2016-05-04 3Dtv工艺有限责任公司 Method for transmitting and receiving stereo information about a viewed space
US9811943B2 (en) 2014-03-17 2017-11-07 Panasonic Intellectual Property Management Co., Ltd. Processing device for label information for multi-viewpoint images and processing method for label information
EP3121792A4 (en) * 2014-03-17 2017-03-15 Panasonic Intellectual Property Management Co., Ltd. Processing device for label information for multi-viewpoint images and processing method for label information
US10567464B2 (en) 2015-04-15 2020-02-18 Google Llc Video compression with adaptive view-dependent lighting removal
US10469873B2 (en) 2015-04-15 2019-11-05 Google Llc Encoding and decoding virtual reality video
US10275898B1 (en) 2015-04-15 2019-04-30 Google Llc Wedge-based light-field video capture
US10546424B2 (en) 2015-04-15 2020-01-28 Google Llc Layered content delivery for virtual and augmented reality experiences
US10565734B2 (en) 2015-04-15 2020-02-18 Google Llc Video capture, processing, calibration, computational fiber artifact removal, and light-field pipeline
US11328446B2 (en) 2015-04-15 2022-05-10 Google Llc Combining light-field data with active depth data for depth map generation
US10341632B2 (en) 2015-04-15 2019-07-02 Google Llc. Spatial random access enabled video system with a three-dimensional viewing volume
US10540818B2 (en) 2015-04-15 2020-01-21 Google Llc Stereo image generation and interactive playback
US10412373B2 (en) 2015-04-15 2019-09-10 Google Llc Image capture for virtual reality displays
US10419737B2 (en) 2015-04-15 2019-09-17 Google Llc Data structures and delivery methods for expediting virtual reality playback
US10205896B2 (en) 2015-07-24 2019-02-12 Google Llc Automatic lens flare detection and correction for light-field images
US9639945B2 (en) 2015-08-27 2017-05-02 Lytro, Inc. Depth-based application of image effects
US9609307B1 (en) 2015-09-17 2017-03-28 Legend3D, Inc. Method of converting 2D video to 3D video using machine learning
US9858649B2 (en) 2015-09-30 2018-01-02 Lytro, Inc. Depth-based image blurring
US11348306B2 (en) 2016-05-02 2022-05-31 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
WO2017191978A1 (en) * 2016-05-02 2017-11-09 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
US10672180B2 (en) 2016-05-02 2020-06-02 Samsung Electronics Co., Ltd. Method, apparatus, and recording medium for processing image
US10275892B2 (en) 2016-06-09 2019-04-30 Google Llc Multi-view scene segmentation and propagation
US10679361B2 (en) 2016-12-05 2020-06-09 Google Llc Multi-view rotoscope contour propagation
US10594945B2 (en) 2017-04-03 2020-03-17 Google Llc Generating dolly zoom effect using light field image data
US10444931B2 (en) 2017-05-09 2019-10-15 Google Llc Vantage generation and interactive playback
US10474227B2 (en) 2017-05-09 2019-11-12 Google Llc Generation of virtual reality with 6 degrees of freedom from limited viewer data
US10440407B2 (en) 2017-05-09 2019-10-08 Google Llc Adaptive control for immersive experience delivery
US10354399B2 (en) 2017-05-25 2019-07-16 Google Llc Multi-view back-projection to a light-field
US10545215B2 (en) 2017-09-13 2020-01-28 Google Llc 4D camera tracking and optical stabilization
US10965862B2 (en) 2018-01-18 2021-03-30 Google Llc Multi-camera navigation interface
CN110719532A (en) * 2018-02-23 2020-01-21 索尼互动娱乐欧洲有限公司 Apparatus and method for mapping virtual environment
CN110719532B (en) * 2018-02-23 2023-10-31 索尼互动娱乐欧洲有限公司 Apparatus and method for mapping virtual environment
US11808562B2 (en) 2018-05-07 2023-11-07 Apple Inc. Devices and methods for measuring using augmented reality
CN113223563A (en) * 2018-09-29 2021-08-06 苹果公司 Device, method and graphical user interface for depth-based annotation
CN113223563B (en) * 2018-09-29 2022-03-29 苹果公司 Device, method and graphical user interface for depth-based annotation
US11632600B2 (en) 2018-09-29 2023-04-18 Apple Inc. Devices, methods, and graphical user interfaces for depth-based annotation
US11818455B2 (en) 2018-09-29 2023-11-14 Apple Inc. Devices, methods, and graphical user interfaces for depth-based annotation
CN111275611A (en) * 2020-01-13 2020-06-12 深圳市华橙数字科技有限公司 Method, device, terminal and storage medium for determining depth of object in three-dimensional scene
CN111275611B (en) * 2020-01-13 2024-02-06 深圳市华橙数字科技有限公司 Method, device, terminal and storage medium for determining object depth in three-dimensional scene
US11797146B2 (en) 2020-02-03 2023-10-24 Apple Inc. Systems, methods, and graphical user interfaces for annotating, measuring, and modeling environments
US11727650B2 (en) 2020-03-17 2023-08-15 Apple Inc. Systems, methods, and graphical user interfaces for displaying and manipulating virtual objects in augmented reality environments
US11941764B2 (en) 2021-04-18 2024-03-26 Apple Inc. Systems, methods, and graphical user interfaces for adding effects in augmented reality environments

Also Published As

Publication number Publication date
WO2011029209A3 (en) 2011-09-29

Similar Documents

Publication Publication Date Title
WO2011029209A2 (en) Method and apparatus for generating and processing depth-enhanced images
US20230377183A1 (en) Depth-Aware Photo Editing
US10225545B2 (en) Automated 3D photo booth
CN109040738B (en) Calibration method and non-transitory computer readable medium
US9445072B2 (en) Synthesizing views based on image domain warping
US9438878B2 (en) Method of converting 2D video to 3D video using 3D object models
US8711204B2 (en) Stereoscopic editing for video production, post-production and display adaptation
JP5801812B2 (en) Virtual insert into 3D video
KR101868654B1 (en) Methods and systems of reducing blurring artifacts in lenticular printing and display
US20110080466A1 (en) Automated processing of aligned and non-aligned images for creating two-view and multi-view stereoscopic 3d images
US20120002014A1 (en) 3D Graphic Insertion For Live Action Stereoscopic Video
US20090219383A1 (en) Image depth augmentation system and method
CN102196280A (en) Method, client device and server
CN101542536A (en) System and method for compositing 3D images
KR20070119018A (en) Automatic scene modeling for the 3d camera and 3d video
AU2018249563B2 (en) System, method and software for producing virtual three dimensional images that appear to project forward of or above an electronic display
TWI531212B (en) System and method of rendering stereoscopic images
EP1843581A2 (en) Video processing and display
US20070122029A1 (en) System and method for capturing visual data and non-visual data for multi-dimensional image display
CN107948631A (en) It is a kind of based on cluster and the bore hole 3D systems that render
Inamoto et al. Immersive evaluation of virtualized soccer match at real stadium model
CN207603821U (en) A kind of bore hole 3D systems based on cluster and rendering
Nguyen Hoang et al. A real-time rendering technique for view-dependent stereoscopy based on face tracking
Steurer et al. 3d holoscopic video imaging system
KR20090088459A (en) System, apparatus, and method for capturing and screening visual images for multi-dimensional display

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10757383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10757383

Country of ref document: EP

Kind code of ref document: A2