WO2024118828A1 - Apparatus and method for real-time three dimensional imaging - Google Patents

Apparatus and method for real-time three dimensional imaging Download PDF

Info

Publication number
WO2024118828A1
WO2024118828A1 PCT/US2023/081670 US2023081670W WO2024118828A1 WO 2024118828 A1 WO2024118828 A1 WO 2024118828A1 US 2023081670 W US2023081670 W US 2023081670W WO 2024118828 A1 WO2024118828 A1 WO 2024118828A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
scene
dimensional
digital
image
Prior art date
Application number
PCT/US2023/081670
Other languages
French (fr)
Inventor
Paul Stuart BANKS
Louis Vintro
Bodo Schmidt
Philip Weber
Jason R. Ensher
Charles Stewart Tuvey
Original Assignee
Nlight, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nlight, Inc. filed Critical Nlight, Inc.
Publication of WO2024118828A1 publication Critical patent/WO2024118828A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • H04N23/11Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths for generating image signals from visible and infrared light wavelengths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • H04N23/13Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths with multiple sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/56Cameras or camera modules comprising electronic image sensors; Control thereof provided with illuminating means
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/2224Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
    • H04N5/2226Determination of depth image, e.g. for foreground/background separation

Definitions

  • Heads-up displays accomplish similar objectives for simple, computer-generated elements.
  • software tools allow computer-generated elements to be overlaid on top of, or behind, traditional images. These traditional software tools are unable to spatially intermix computer-generated elements with real elements, because they lack depth information.
  • large display walls may be used to display computer generated content. The large wall display may then be filmed to provide a background for the actors. Display walls adequately display the background scene to film directors, but depth information is required for visualization of the entire scene, including the foreground and digital elements between the actors and the background display screen.
  • Other approaches employ more complex solutions.
  • real action may be filmed in front of a blue or green screen, using color to separate the foreground from the background.
  • This technique is called chromakeying.
  • motion capture devices are able to track the basic motion of real objects.
  • Software is used to couple the recorded motion with computer generated elements.
  • laser scanners may be employed to digitize the location of static objects.
  • digitizing the location of static objects requires significant time to acquire the necessary data.
  • Some depth or “RGBD” (red green blue depth) cameras have been employed to accomplish real-time or near-real-time results, but these attempts have been too limited in performance and are impractical outside of limited experiments.
  • RGBD red green blue depth
  • the present invention provides a system and methods that address one or more deficiencies in the prior art.
  • the present invention employs high definition, three dimensional (“HD3D”) imaging to make it possible to effectively combine computer-generated and real elements, either in real-time or as part of a more extensive offline workflow.
  • the present invention provides an optical system for generating a three dimensional, digital representation of a scene.
  • the optical system includes a nonvisible illumination source to generate and to illuminate the scene with nonvisible light.
  • the optical system also includes a first camera and a second camera.
  • the first camera is adapted to receive light from the scene and generate a first digital map therefrom.
  • the first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels.
  • Each first camera pixel of the plurality of first camera pixels is associated with a color.
  • the second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the illumination source, and to generate a second digital map therefrom.
  • the second digital map comprises a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels.
  • Each second camera pixel of the plurality of second camera pixels is associated with a depth. The depth is determined as a function of optical time of flight information for the nonvisible light.
  • the system also includes a processor connected to the first camera and to the second camera. The first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween.
  • the processor receives the first digital map and the second digital map and combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene.
  • the three dimensional, digital representation of the scene satisfies the following parameters.
  • the three dimensional, digital representation of the scene has an image resolution with an image pixel count of greater than or equal to 900,000 pixels (or ⁇ 0.9 megapixels, abbreviated as ⁇ 0.9 M), wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene.
  • three dimensional, digital representation of the scene has an image latency of 10 ms – 30 sec, where the image latency comprises a processing time between receipt of the first and second digital maps by the processor and generation of the three dimensional, digital representation of the scene.
  • the three dimensional, digital representation of the scene involves a distance of 10 cm – 200 m, where the distance is measured from the first and second cameras to an object in the scene.
  • the system also includes a display connected to the processor to display the three dimensional representation.
  • the system of the present invention may incorporate a memory connected to the processor to save the three dimensional, digital representation of the scene.
  • the system of the present invention also is contemplated to encompass instances where successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 frames per second (“fps”) – 250 fps.
  • the three dimensional digital representation of the scene also satisfies at least one of the following parameters: (1) the image resolution has the image pixel count of greater than or equal to 0.9 M, (2) the distance is 10 cm – 50 m, (3) the frame rate is 23 – 250 fps. [0022] Where the system of the present invention is employed in pre-visualization (pre- viz) of scenes for movie production, the three dimensional, digital representation of the scene also satisfies an image latency is 10 ms – 10 s.
  • the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency is 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 2 m – 200 m, and (4) a frame rate is 23 – 250 fps.
  • the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency of 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 1 m – 50 m, and (4) a frame rate is 5 – 250 fps.
  • the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency is 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 50 cm – 200 m, and (4) a frame rate is 5 – 250 fps.
  • the first camera and the second camera may be housed within a single housing.
  • the single housing is contemplated to include a single optical element and a beam splitter after the optical element to direct the light to the first camera and the nonvisible light to the second camera.
  • the system of the present invention may be constructed so that the first camera encompasses a plurality of first cameras. Moreover, the second camera may encompass a plurality of second cameras.
  • Another contemplated embodiment of the present invention provides a system for generating a three dimensional, digital representation of a scene.
  • the system includes a nonvisible light illumination source to generate and to illuminate the scene with nonvisible light, a first camera and a second camera.
  • the first camera is adapted to receive light from the scene and generate a first digital map therefrom.
  • the first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels.
  • Each first camera pixel of the plurality of first camera pixels is associated with a color or with a grayscale.
  • the second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the nonvisible light illumination source, and to generate a second digital map therefrom.
  • the second digital map encompasses a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels.
  • Each second camera pixel of the plurality of second camera pixels is associated with a depth, where the depth is determined as a function of optical time of flight information for the nonvisible light.
  • the system also includes a processor connected to the first camera and to the second camera. The first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween.
  • the processor receives the first digital map and the second digital map and combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene.
  • the three dimensional, digital representation of the scene satisfies the following: (1) an image resolution with an image pixel count of greater than or equal to 0.9 M, wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene, (2) a distance of 10 cm – 200 m, wherein the distance is measured from the first and second cameras to an object in the scene, and (3 successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 fps – 250 fps.
  • the system may include a display connected to the processor to display the three dimensional representation.
  • a memory may be connected to the processor to save the three dimensional, digital representation of the scene.
  • the three dimensional digital representation of the scene satisfies at least one of: the distance is 10 cm – 50 m, and the frame rate is 23 – 250 fps.
  • the three dimensional, digital representation of the scene satisfies at least one of: the distance is 1 m – 50 m, and the frame rate is 5 – 250 fps.
  • the three dimensional, digital representation of the scene satisfies at least one of: the distance is 50 cm – 200 m, and the frame rate is 5 – 250 fps.
  • the first camera and the second camera may be housed within a single housing.
  • the single housing may include a beam splitter to direct the light to the first camera and the nonvisible light to the second camera.
  • the first camera may encompass a plurality of first cameras and the second camera may encompass a plurality of second cameras.
  • the present invention is not limited solely to the features and aspects listed above.
  • Fig. 1 is a graphical representation of a first contemplated embodiment of the optical system of the present invention
  • Fig.2 is a graphical representation of a second embodiment of the optical system of the present invention
  • Fig.3 is a graphical representation of a third embodiment of the optical system of the present invention
  • Fig. 1 is a graphical representation of a first contemplated embodiment of the optical system of the present invention
  • Fig.2 is a graphical representation of a second embodiment of the optical system of the present invention
  • Fig.3 is a graphical representation of a third embodiment of the optical system of the present invention
  • Fig. 3 is a graphical representation of a third embodiment of the optical system of the present invention
  • FIG. 4 is a graphical representation of one contemplated manipulation of digital information to generate a three dimensional, digital representation of a scene;
  • Fig.5 provides side-by-side comparisons between digital maps created by systems in the prior art with the optical system of the present invention;
  • Figs. 6 - 11 provide various images illustrating aspects associated with the manipulation of a three dimensional, digital representation of a scene created according to the present invention.
  • Detailed Description of Embodiment(s) of the Invention [0046] The present invention will now be described in connection with several examples and embodiments. The present invention should not be understood to be limited solely to the examples and embodiments discussed. To the contrary, the discussion of selected examples and embodiments is intended to underscore the breadth and scope of the present invention, without limitation.
  • first,” “second,” “third,” etc. may be used to refer to like elements. These terms are employed to distinguish like elements from similar examples of the same elements.
  • one fastener may be designated as a “first” fastener to differentiate that fastener from another fastener, which may be designated as a “second fastener.”
  • the terms “first,” “second,” “third,” are not intended to convey any particular hierarchy between the elements so designated.
  • optical system 10 is a graphical representation of an optical system 10 according to a first embodiment of the present invention.
  • optical system is employed, because the system of the present invention involves light and optics. Use of the word “optical” should not be understood to limit the scope of the present invention.
  • the system involves components other than optical components.
  • the optical system 10 combines several components that, together, generate a three dimensional, digital representation of a scene 12. In Fig. 1, the scene 12 encompasses three representative objects, a person 14, a train locomotive 16, and a tree 18.
  • the optical system 10 of the present invention captures light reflected from the scene 12 and renders the three dimensional, digital representation from that light.
  • the light received by the optical system 10 includes two components, as described in greater detail hereinbelow.
  • the optical system 10 receives light from the scene 12.
  • the light is used to generate a color map that is used to create the three dimensional, digital representation of the scene 12.
  • the light from the scene 12 may be provided by natural and/or artificial light sources.
  • the light from the scene 12 may combine natural sunlight together with lights provided, for example, from stage spotlights.
  • the optical system 10 receives nonvisible light from the scene 12.
  • the nonvisible light is used to generate a depth map that is used, together with the color map, to generate the three dimensional, digital representation of the scene 12.
  • the color map may encompass actual color, meaning red, blue, and green (“RGB”) components.
  • the color map may be a monochromatic color map, meaning that the color map encompasses a grayscale image.
  • RGB red, blue, and green
  • the optical system 10 incorporates a nonvisible light illumination source 20.
  • the nonvisible light illumination source 20 generates nonvisible light 22 that is used to illuminate the scene 12.
  • the nonvisible light 22 may be any type of nonvisible light 22 from the electromagnetic spectrum.
  • Nonvisible light 22 encompasses light that is outside of human visual perception.
  • the nonvisible light 22 is either infrared (“IR”) light or ultraviolet (“UV”) light.
  • the optical system 10 includes a first camera 24 and a second camera 26.
  • the first camera 24 is a color camera that incorporates a first sensor 28 comprising a plurality of first camera pixels 30. In operation, each first camera pixel in the plurality of first camera pixels 30 is associated with a color or a grayscale value.
  • the first sensor 28 is capable of capturing light 32 and transforming the light into a first digital map 34.
  • the first digital map 34 also is referred to herein as a color digital map.
  • the first digital map 34 may be an RGB map, or it may be a grayscale digital map. It is noted that each first camera pixel of the plurality of first camera pixels 30 is associated with a color at each azimuthal and longitudinal position in the two dimensional field of view of the first camera 24.
  • the second camera 26 is a depth camera that incorporates a second sensor 36 comprising a plurality of second camera pixels 38. In operation, each second camera pixel in the plurality of second camera pixels 38 is associated with a depth value.
  • the second sensor 36 is capable of capturing the nonvisible light 40 reflected from the scene 12 and transforming the nonvisible light 40 into a second digital map 42.
  • the second digital map 42 also is referred to herein as a depth digital map. It is noted that each second camera pixel of the plurality of second camera pixels 38 is associated with a depth to a surface, as well as the intensity of light reflected from that surface, at each azimuthal and longitudinal position in the two dimensional field of view of the second camera 26. [0061] Concerning the depth value, it is noted that the depth values are generated using optical time of flight (“oTOF”) information that is associated with the nonvisible light 40 received by the second sensor 36 from the scene 12. This is described in greater detail herein below. [0062] As also illustrated in Fig.4, the first digital map 34 and the second digital map 42 are inputted into the processor 46.
  • oTOF optical time of flight
  • the processor 46 combines the first digital map 34 and the second digital map 42 to create the three dimensional, digital representation 44 of the scene 12.
  • the processor 46 should not be understood to be a single processor.
  • the processor 46 may encompass a plurality of components (e.g., several processors) without departing from the scope of the present invention. Where there are multiple processors 46, the multiple processors 46 may be operated at different times, as discussed in greater detail below.
  • the plurality of first camera pixels 30 in the first camera 24 generate the first pixel map 34 with a first resolution of about 100K – 400M. As should be apparent to those skilled in the art, this indicates that there are about 100K – 400M of the first camera pixels that comprise the plurality of first camera pixels 30.
  • the plurality of second camera pixels 38 in the second camera 26 generate the second pixel map 42 with a second resolution of about 100K – 400M. Therefore, this indicates that there are about 100K – 400M of the second camera pixels that comprise the plurality of second camera pixels 38.
  • the first resolution may be less than, equal to, or greater than the second resolution. In one contemplated embodiment, the first resolution is equal to the second resolution, but this is not intended to be limiting of the present invention.
  • the first camera 24 and the second camera 26 are contemplated to operate synchronously. In particular, the cameras 24, 26 function so that, for each three dimensional, digital representation 44 generated, the first digital map 34 and the second digital map 42 have a scenic correlation therebetween.
  • the information recorded in the first digital map 34 may be at least nearly identical to the information recorded in the second digital map 42 – from the perspective of what action is happening within the scene 12.
  • the first digital map 34 and the second digital map 42 may be processed to create the three dimensional, digital representation 44 of the scene 12, because the information in the digital maps 34, 42 correlates with one another. From one perspective, it can be understood that the first digital map 34 is created nearly at the same time (or nearly the same time) as the second digital map 42, so that both digital maps 34, 42 capture effectively the same information from the scene 12.
  • the delay from generating the first digital map 34 to the generation of the second digital map 42 is less than about 1 ⁇ 2 (one half) of the frame length.
  • the processor 46 in the optical system 10 is connected to the first camera 24 via a first communication link 48.
  • the processor 46 connects to the second camera 26 via a second communication link 50.
  • the processor 46 connects to the nonvisible light illumination source 20 via a third communication link 52.
  • the processor 46 may be connected to a display 54 via a fourth communication link 56.
  • the processor 46 also may connect to a memory 58 via a fifth communication link 60.
  • the processor 46 may be of any type capable of executing instructions, in the form of software, to generate the three dimensional, digital representation 44 by combining first digital map 34 with the second digital map 42. Any suitable processor 46 may be employed for this purpose.
  • the processor 46 in this embodiment, is contemplated to generate instructions to the nonvisible light illumination source 20 to generate the nonvisible light 22 that is used to illuminate the scene 12. Alternatively, instructions may be issued to the nonvisible light illumination source 20 via other avenues that should be apparent to those skilled in the art.
  • the three dimensional, digital representation 44 is contemplated to satisfy at least one of the following parameters.
  • the three dimensional, digital representation 44 is contemplated to comprise an image resolution with an image pixel count of ⁇ 0.9 M.
  • the image pixel count is understood to be suitable for a final pixel presentation. More specifically, the image pixel count may be matched to a display pixel count associated with the display 54, but a matched pixel count is not required for implementation of the present invention.
  • standard, commercially available displays 54 have a display pixel count (or “pixel density”) that may be anywhere from 720p to 8K.
  • these resolutions include 720p (1280 x 720 pixels, a total of 921,600 pixels), 1080p (1920 x 1080 pixels, a total of 2,073,600 pixels), 1440p (2560 x 1440 pixels, a total of 3,686,400 pixels), 2K (2048 x 1080 pixels, a total of 2,211,840 pixels), 4K (3480 x 2160 pixels, a total of 7,516,800 pixels), 5K (5120 x 2880 pixels, a total of 14,745,600 pixels), and 8K (7860 x 4320 pixels, a total of 33,955,200 pixels).
  • the three dimensional, digital representation 44 may be displayed readily on the display 54, it is contemplated that the image resolution of the three dimensional, digital representation 44 will match the display resolution of the display 54. Alternatively, the image resolution of the three dimensional, digital representation 44 may be higher or lower than that the display resolution. If so, appropriate corrections may be made for display of the three dimensional, digital representation 44 on the display 54, as should be apparent to those skilled in the art.
  • the three dimensional, digital representation 44 is contemplated to have an image latency of 10 ms – 30 sec. The image latency comprises a processing time between receipt of the first and second digital maps 34, 42 by the processor 46 and the generation of the three dimensional, digital representation 44 of the scene 12.
  • the image latency is not limited solely to a range of 10 ms – 30 sec.
  • the image latency may be defined as any range, defined by endpoints, in 10 ms increments, from 10 ms to 30 sec.
  • the image latency range may be 500 ms – 1 sec.
  • Another embodiment contemplates a range for the image latency from 10 ms – 1900 ms.
  • the present invention also encompasses any specific image latency that is equal to each 10 ms endpoint from 10 ms to 30 sec.
  • the three dimensional, digital representation 44 is contemplated to incorporate depth information encompassing a distance of 10 cm – 200 m.
  • the distance is measured from the first and second cameras 24, 26 to an object (e.g., the person 14, the locomotive 14, and/or the tree 18) in the scene 12.
  • the objects in the scene 12 are contemplated to be 1 – 200 m from the first camera 24 and the second camera 26.
  • the distance is not limited solely to a range of 10 cm – 200 m.
  • the distance may be defined as any range, defined by endpoints, in 10 cm increments, from 10 cm to 200 m.
  • the distance may be 10 cm – 5 m.
  • Another embodiment contemplates a range for the distance from 20 m – 100 m.
  • the present invention also encompasses any specific distance that is equal to each 10 cm endpoint from 10 cm to 200 m.
  • specific values for the image latency may be 10 cm, 20 cm, 1 m, 5 m, 10 m, and 100 m, as representative examples.
  • a first object distance 62 is shown from the first camera 24 and from the second camera 26 to the person 12.
  • a second object distance 64 designates a separation between the locomotive 16 and the first and second cameras 24, 26.
  • a third object distance 66 identifies the separation between the tree 18 and the first and second cameras 24, 26.
  • Each of these distances 62, 64, 66 are contemplated to lie within the distance range of 10 cm – 200 m.
  • Fig. 1 also illustrates a camera separation distance 68.
  • the camera separation distance 68 is a distance separating the first camera 24 from the second camera 26.
  • the first camera 24 may be separated from the second camera 26 by a distance, identified as the camera separation distance 68.
  • the camera separation distance is contemplated to fall within a range of about 1 cm – 2 m.
  • the first camera 24 and the second camera 26 may be combined into a single unit where the single camera detect both light 32 and nonvisible light 40.
  • the camera separation distance is not limited solely to a range of 1 cm – 2 m.
  • the camera separation distance may be defined as any range, defined by endpoints, in 1 cm increments, from 1 cm to 2 m.
  • the camera separation distance may be 1 cm – 1 m.
  • Another embodiment contemplates a range for the camera separation distance from 50 cm – 1.5 m.
  • the present invention also encompasses any specific camera separation distance that is equal to each 1 cm endpoint from 1 cm to 2 m.
  • the optical system 10 will include a display 54. As shown, the display 54 is contemplated to connect to the processor 46 via the fourth communication link 56. When provided, the display 54 displays the three dimensional, digital representation 44. It is noted that the display 54 may be omitted without departing from the scope of the present invention.
  • the optical system 10 may include a memory 58 connected to the processor 46 via the fifth communication link 60. The memory 58 is contemplated to satisfy one or more operating parameters.
  • the memory 58 provides a location where the three dimensional, digital representation 44 may be stored. Still further, the memory 58 may store the software for execution by the processor 46. It is noted that the memory 58 may be omitted without departing from the scope of the present invention. [0084] As noted above, in one embodiment of the present invention, it is contemplated that the display 54 is provided with a display resolution that matches the image resolution of the three dimensional, digital representation 44. However, this construction is not required to remain within the scope of the present invention. It is contemplated that, in other embodiments, the image resolution of the three dimensional, digital representation 44 may differ from the display resolution.
  • the three dimensional, digital representation 44 of the scene 12 has been described as a single frame or still shot image of the scene 12.
  • successive ones of the three dimensional, digital representations 44 of the scene 12 are assembled sequentially, they form a video.
  • the present invention contemplates that successive ones of the three dimensional, digital representations 44 of the scene 12 may be assembled sequentially by the processor 46 to generate the video. If so, the video is contemplated to have a frame rate of between about 5 frames per second (“fps”) –250 fps.
  • fps frames per second
  • the frame rate is not limited solely to a range of 5 – 250 fps.
  • the frame rate may be defined as any range, defined by endpoints, in 5 fps increments, from 5 fps – 250 fps.
  • the frame rate may be 5 fps – 50 fps.
  • Another embodiment contemplates a range for the frame rate from 15 fps – 25 fps.
  • the present invention also encompasses any specific camera separation distance that is equal to each 5 fps endpoint from 5 fps – 250 fps.
  • specific values for the image latency may be 5 fps, 10 fps, 15 fps, 50 fps, and 100 fps, as representative examples.
  • Fig. 2 provides a graphical representation of a second embodiment of an optical system 70 contemplated by the present invention.
  • the optical system 70 shares many of the features described in connection with the optical system 10 illustrated in Fig. 1.
  • the nonvisible light illumination source 20, the first camera 24, and the second camera 26 are disposed within a first housing 72.
  • the first housing 72 includes a first optical element 74 that directs the light 32 to the first camera 24.
  • the first housing 72 also includes a second optical element 76 that directs the nonvisible light 40 to the second camera 26.
  • the first and second optical elements 74, 76 may be lenses, for example.
  • FIG.3 is a graphical representation of a third embodiment of an optical system 78 according to the present invention.
  • the first camera 24 and the second camera 26 are enclosed in a second housing 80 such that the first camera 24 is positioned perpendicularly to the second camera 26. It is noted that the perpendicular disposition of the first camera 24 in relation to the second camera 26 is merely exemplary and is not limiting of the present invention.
  • the light 32 and the nonvisible light 40 enter through a third optic into the second housing 80.
  • the third optic 82 may be a lens or a plurality of lenses.
  • the third optic 82 may be an aperture in the second housing 80.
  • the third optic 82 may combine lenses and/or apertures as required and/or as desired.
  • the light 32 and the nonvisible light 40 pass through a fourth optic 84. At the fourth optic 84, the light 32 and the nonvisible light 40 are split from one another.
  • the fourth optic 84 also is referred to as an optical splitter.
  • the optical splitter 84 permits the light 32 to pass therethrough to the first camera 24.
  • the nonvisible light 40 is redirected, perpendicularly to the light 32, so that the nonvisible light 40 is directed to the second camera 26.
  • the fourth optic 84 is contemplated to be a an optical component referred to as a beam splitter.
  • any other optical component that splits the light into the light 32 and the nonvisible light 40 may be employed without departing from the scope of the present invention.
  • the optical system 70 and the optical system 78 are contemplated to operate in the same manner as discussed in connection with the optical system 10.
  • the optical systems 10, 70, 78 are contemplated to operate in at least one of five contemplated configurations.
  • the five configurations include, but are not limited to: (1) movie production, (2) “pre-viz” movie production, (3) live broadcast production, (4) volumetric capture, and (5) static capture. Each of these five configurations is discussed below.
  • the optical system 10, 70, 78 is contemplated to operate for movie production.
  • Movie production involves the creation of a video that captures visual information from, for example, actors disposed within the scene 12.
  • the optical system 10, 70, 78 is employed in the context of movie production, the “raw data” of the action generated by the actors in the scene 12 is captured.
  • That “raw data” may encompass, for example, actors performing in front of a blue or green screen, which is a blue or green background employed by those skilled in the art. When actors perform in front of a blue or green screen, background elements are inserted at a later date, during editing, for example. Alternatively, actors may perform in front of one or more light emitting diode (“LED”) walls (otherwise referred to as “light walls”) consisting of digitally-generated backgrounds. Both approaches are commonly found in Virtual Production for movies and series features, for cinematic, TV, or personal device viewing. [0099] To operate for movie production, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters.
  • LED light emitting diode
  • the image resolution has an image pixel count of greater than or equal to 0.9 megapixels (“M”) (“0.9 M”), which refers to the total number of pixels in the three dimensional, digital representation 44.
  • M 0.9 megapixels
  • the distance from the first and second cameras 24, 26 to the scene 12 is between about 10 cm – 50 m.
  • the frame rate of the video is between about 23 – 250 fps.
  • Movie production also encompasses post-production processing to create a final version of a movie that has been prepared for projection to an audience. Post-production editing and manipulation of the action captured during the movie production phase can take days, weeks, months, or even years to generate the three dimensional, digital representation 44. Here, latency is not a parameter.
  • the processor 46 may encompass several processors that are operated at various times during post-production.
  • the optical system 10, 70, 78 is contemplated to operate for “pre-viz” movie production.
  • “Pre-viz” movie production differs from movie production in that the optical system 10, 70, 78 incorporates the display 54 and the three dimensional, digital representation 44 incorporates at least a rough, rendered environment in which the actors are inserted.
  • pre-viz which stands for “pre-visualization”
  • a pre-viz video is contemplated to be displayed to a movie director and/or producer immediately after the movie production so that the producer and/or director may judge if the actors have performed in a satisfactory manner.
  • “pre-viz” movie production may be understood as a preview of the final movie, after editing is completed.
  • the optic system 10, 70, 78 will operate according to the same parameters identified for movie production.
  • the optical system 10, 70, 78 also is contemplated to satisfy the image latency that is between about 10 ms – 10 s.
  • the optical system 10, 70, 78 is contemplated to operate for live broadcast production.
  • the optic system 10, 70, 78 will operate according to the following parameters.
  • the image resolution has the image pixel count of greater than or equal to 0.9 M.
  • the distance from the first and second cameras 24, 26 to the scene 12 is between about 2 m – 200 m.
  • the frame rate of the video is between about 23 – 250 fps.
  • the latency is between about 10 ms – 1 s.
  • volumetric capture refers to the capture of an image and/or video within a defined spatial volume, such as on a sound stage. Volumetric capture is employed in instances where the elements of the scene 12 and the actors are disposed within a predefined space. Volumetric capture may involve generating three dimensional, digital representations 44 from multiple view points and/or multiple angles with respect to the volumetric space. [00109] For volumetric capture, at least two images, taken from different viewpoints are required to generate the three dimensional, digital representation 44. [00110] To operate for volumetric capture, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters.
  • the image resolution has the image pixel count of greater than or equal to 0.9 M.
  • the distance from the first and second cameras 24, 26 to the scene 12 is between about 1 m – 50 m.
  • the frame rate of the video is between about 5 – 250 fps.
  • the latency is between about 10 ms – 1 s.
  • the three dimensional, digital representation 44 may take hours, days, weeks, months, or years to create, depending on the amount of post-production processing required and/or desired. As such, for this variation, latency is not a parameter. Accordingly, in this post-production static camera capture environment, the optic system 10, 70, 78 will operate according to the following parameters.
  • the image resolution has the image pixel count of greater than or equal to 0.9 M.
  • the distance from the first and second cameras 24, 26 to the scene 12 is between about 1 m – 50 m.
  • the frame rate of the video is between about 5 – 250 fps.
  • Static capture refers to the capture of an image and/or video from a single viewpoint, such as would exist if a person were to take a picture using his or her cell phone camera, for example. It is noted that static capture encompasses two separate conditions.
  • the optic system 10, 70, 78 will operate according to the following parameters.
  • the image resolution has the image pixel count of greater than or equal to 0.9 M.
  • the distance from the first and second cameras 24, 26 to the scene 12 is between about 50 cm – 200 m.
  • the frame rate of the video is between about 5 – 250 fps.
  • the latency is between about 10 ms – 1 s.
  • the three dimensional, digital representation 44 may take hours, days, weeks, months, or years to create, depending on the amount of post-production processing required and/or desired. As such, for this variation, latency is not a parameter. Accordingly, in this post-production static camera capture environment, the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 50 cm – 200 m.
  • the frame rate of the video is between about 5 – 250 fps.
  • the optical systems 10, 70, 78 of the present invention encompass systems and methods that permit the generation of high resolution images of scenes, including wide field of view scenes. Specifically, the systems and methods are contemplated to record, simultaneously, three dimensional position information for multiple objects in a scene with high spatial and distance resolution, along with intensity (grey-scale or color) information about the scene. Both the color and intensity information is recorded for every pixel in an array of pixels for each image.
  • the intensity and position information are combined into a single three-dimensional image that approximates a human view of the scene.
  • This is referred to, herein, as the three dimensional, digital representation 44.
  • This depth information can be in the form of a 2D depth map, which provides a distance value to the surface imaged by that pixel, or it can be in the form of a more extensive 3D representation of the location of the surfaces present in an area or scene, such as a point cloud or surface mesh or voxel grid or similar ways of representing such 3D location information.
  • Eventually such representations can also account for how such location information changes over time.
  • Photogrammetry is an example of a broader class of approaches (stereo, structured light, structure from motion) that use geometry to determine the location of points, objects, or surfaces. Photogrammetry is akin to triangulation in navigation. Microsoft, Apple, and Intel as well as others have created products based on this type of approach, with limited performance.
  • geometry-based approaches are limited in operating range, usually a few meters.
  • Geometry-based approaches also require substantial computational power to correctly compute the location solution for pixel densities relevant for imaging, for example, > 100,000 points.
  • Such products have been used to attempt to capture 3D information for many applications and use cases from robotics to motion pictures. However, the performance of these products has been so poor that many have concluded that such 3D cameras or capture devices are not relevant for these use cases.
  • the other traditional approach to 3D capture is to use electronic means to detect when illumination light makes it back to the camera or sensor.
  • TOF time-of-flight
  • FMCW frequency modulated continuous wave
  • AMCW amplitude modulated continuous wave
  • APDs linear-mode and Geiger-mode avalanche photodiodes
  • SFD single photon avalanche diode
  • All of these approaches have, in common, an electronic detector of some fashion that measures a phase change or arrival time of the returned light.
  • single point scanners are able to achieve very high accuracy and point density over long ranges up to 1 km or more. However, single point scanners take minutes to hours to collect a reasonable number of points.
  • Multi-point scanners typically have poorer accuracy and poorer point density than single point scanners. Moreover, multi-point scanners still take many seconds to collect a reasonable number of points.
  • Imaging arrays that can capture image-like depth data have been limited to short range, e.g., ⁇ a few meters, or very low point density, e.g., ⁇ 20,000 points or pixels, or both. Those imaging arrays that are able to achieve longer ranges also suffer from very high costs, e.g., $10,000 or more. Imaging arrays have been tried for many use cases, but, with the exception of scanning of static areas where high cost is acceptable, the performance is so poor that adoption has not occurred, and many experts have concluded that such technologies are not relevant for those use cases.
  • the final class of 3D capture technologies is optically based, where the properties of light itself are used to determine changes in distance.
  • these 3D capture technologies have included interferometric techniques and coherent holography.
  • Such systems are extremely costly (e.g., $100K and up). While these technologies are very precise, they do not work well outside laboratories and are limited in operating range and/or point density. They are not compatible with the broader commercial use cases described here.
  • the optical systems 10, 70, 78 of the present invention offer a new approach to solve the 3D capture problem in a practical way that can enable the use of 3D information to enhance and improve these use cases.
  • oTOF all-optical TOF
  • oTOF uses an external modulation device in front of a normal sensor (e.g., the second camera 26) coupled with a pulsed illumination (e.g., by the nonvisible light illumination source 20) to create a modulated image and a non-modulated or reference image.
  • the ratio of these two images, multiplied by the appropriate factor, is a direct measurement of the time when the illumination light returns to the camera and is achieved without measuring the time using any electronic means (e.g., the second digital map).
  • the result of the combination of these variables is a device that can achieve all three key elements at the same time in a single device, e.g., the optical system 10, 70, 78.
  • An added bonus is that the technology scales in cost as any 2D camera and so can achieve costs that are relevant for almost any industry or use case.
  • Resolution [00130] The lateral or transverse resolution or point density of the 3D points is related to the feature size that can be identified or manipulated or measured.
  • Fig.5 illustrates some comparisons of the images 86, 88, 90 produced by current (prior art) devices (left column) and the optical systems 10, 70, 78 of the present invention (right column), images 92, 94, 96.
  • a depth map an image with greys representing distance for each point
  • a 700,000 point oTOF depth map (according to the optical systems 10, 70, 78 of the present invention) that reaches 10 m.
  • Image 86 is representative of the Microsoft Kinect Azure approach.
  • the real resolution e.g., the size of features that can be readily detected or identified
  • the fingers, cap bill, and spoke in the bench wagon wheels are clearly visible in the 3D information, e.g., the second digital map produced by the optical system 10, 70, 78. This is the image 92.
  • the second row of images compares a typical (prior art) point cloud (the collection of 3D points rendered in 3D coordinates) from a photogrammetric solution at ⁇ 50 m scale (left side, image 88) to a point cloud created from an oTOF video feed that reaches 20 m outdoors according to the present invention (right side, image 94).
  • image 94 generated according to the optical system 10, 70, 78 of the present invention shows branches and leaves even at these long distances.
  • high cost precision laser scanners can produce a resolution that is similar to the resolution of the oTOF point cloud, e.g., the three dimensional, digital representation 44 of the present invention.
  • the bottom row of images compares a depth map from a multi-point LIDAR scanner (prior art, left image 90) with a 700,000 point oTOF depth map (e.g., the three dimensional, digital representation 44) (invention, right image 96).
  • the colors in these images represent the distance at each point or pixel.
  • the tricycle is approximately 30 m from the camera or capture device.
  • the optical system 10, 70, 78 of the present invention produces a three dimensional, digital representation 44 with improved resolution or point density, which is evidenced by the ability to identify features or objects, as is important in the present invention.
  • a depth map pixel count that is greater than about 100,000 pts/pixels (with a spatial frequency performance appropriate for such a pixel count instead of the Kinect prior art device).
  • a depth map of greater than about 300,000 points/pixels (VGA equivalent), or greater than about 500,000 points/pixels, or greater than about 700,000 points/pixels, or greater than about 1,000,000 points/pixels may be desired.
  • optical system 10, 70, 78 of the present invention is capable of achieving these objectives, unlike the prior art.
  • Range [00138]
  • the range provided by the optical system 10, 70, 78 of the present invention also improves over the prior art.
  • the range also is discussed herein as the distances 62, 64, 66 from the first and second cameras 24, 26 to objects 14, 16, 18 in the scene 12. The differences in operating range can also be compared with reference to Fig.5.
  • the Microsoft Kinect system at the top row, left side cannot achieve the distances/ranges of 20 – 30 m. Even at 5 m, there are many missing points, which prevent the product from being a solution for many of the use cases described herein.
  • the use cases described below, when in an indoor set may have objects that are within 1 m of the camera or within 3 m of the camera or they may be placed further away. For example, the objects may be placed at distances of > 3 m from the first and second cameras 24, 26, > 10 m, > 20 m, > 30 m, or, in some cases, > 100 m. On some sets or in some projects, the objects may be static or moving or may have video displays with moving or static images.
  • These objects may be located between 1 m and 10 m from the first and second cameras 24, 26 or from 3 m to 30 m from the first and second cameras 24, 26, or 1 m to 30 m or between 2 m and 20 m, or other ranges of locations as may suit the needs. At times the objects may move outside these ranges/distances. In uses that are outdoors, the objects may be located in similar fashion as indoors. They may be also located between 10 m and 100 m from the first and second cameras 24, 26, between 1 m and 100 m from the first and second cameras 24, 26, between 10 m and 50 m, between 3 m and 50 m, or other location ranges as may suit.
  • the optical system 10, 70, 78 of the present invention is capable of accommodating these distances.
  • Speed refers to how long it takes to acquire and use the 3D data of interest. For example, how long does it take to acquire the equivalent of a frame of depth data (or a depth map or a point cloud). It refers to the latency or lag between the physical event or time to when the 3D data is available to be used in the workflow. It refers to how fast such 3D data are available in an ongoing fashion (e.g., the frame rate).
  • the latency encompasses the time between when the images are captured by the first and second cameras 24, 26 and the generation of the three dimensional, digital representation 44 by the processor 46.
  • large studio photogrammetry solutions such as those built by Canon, Microsoft, or Intel
  • Smaller, multi-camera volumetric capture solutions can take days to calculate the 3D data.
  • Such large time scales are not practical for these use cases and are too costly.
  • the optical system 10, 70, 78 operates with latencies of 5 sec of less, latencies of ⁇ 1 sec, ⁇ 200 ms, ⁇ 100 ms, or ⁇ 50 ms (or the equivalent in number of frames or other metric).
  • latencies may be ⁇ 10 minutes, ⁇ 5 minutes, ⁇ 1 min, or ⁇ 30 seconds.
  • the present invention also is contemplated to be suitable for projects that require frame rates of 10 fps or higher, including frame rates of > 1 fps, > 20 fps, approx.24 fps, approx. 30 fps, > 30 fps, approximately 48 fps, approximately 60 fps, > 90 fps, approximately 98 or 100 fps, approximately 120 fps, or other frame rates as may suit. These frames may need to be synchronized with other systems or cameras.
  • the second camera 26 of the optical system 10, 70, 78 of the present invention initially creates a 2D array of distance measurements that corresponds to each pixel of the image sensor(s) used in the 3D camera.
  • This depth map i.e., the second digital map 42
  • the points of the depth map may be used as vertices to create a mesh of polygons, for example, triangles or quadrangles by connecting the points.
  • the depth map (the second digital map 42) may be used to calculate other vertices that are attached or part of a fixed grid or pattern in a volume of space, such as a voxel grid. The method of such calculation may be based on a confidence weighting factor for the distances in the depth map.
  • the intensity of the IR is recorded or provided as an intensity map or IR image.
  • each IR pixel corresponds to a depth value of the depth map.
  • This information can be used to colorize or associate a monochrome or color value to each point or vertex.
  • texture coordinates or other equivalent representations may be generated which provide a correlation between the 3D mesh and the portion of the image or texture that corresponds to that 3D surface.
  • the texture and 3D mesh can then be rendered in appropriate software such game engines.
  • Combination with RGB or other 2D cameras [00150]
  • Figs 1 and 2 illustrate examples of how the first and second cameras 24, 26 may be oriented next to each other, separated physically by some distance.
  • the word “camera” may refer to a separately housed and lensed system or modules integrated within a common housing or even sensors that share optical systems in an appropriate fashion as described elsewhere.
  • Obscura In Fill [00151] An issue that arises when combining imagery from multiple sources is parallax as illustrated in Figs. 1 and 2.
  • each pixel’s field of view (“iFOV”) being different between the first and second cameras 24, 26 and/or the images produced thereby (e.g., the first digital map 34 and the second digital map 42), the difference being dependent on the separation between the first and second cameras 24, 26 and the distance 62, 64, 66 that object 14, 16, 18 is from the first and second cameras 24, 26.
  • Another impact is that there will be surfaces behind the foremost objects 14, 16, 18 that are blocked from the view of one or more of the first and second cameras 24, 26. For example, if the first and second cameras 24, 26 are offset vertically, the bottom camera will not see an area on a background object above the foremost object.
  • the first is to position the optic axes of the two imaging lenses (e.g., the first optic 74 and the second optic 76) such that they are collinear.
  • a way to achieve this alignment is illustrated in Fig. 3, for example.
  • the first and second cameras 24, 26 are positioned mechanically in 6 degrees of freedom (“DOF”).
  • DOF degrees of freedom
  • the precision of this alignment can vary, dependent on the requirements of the particular use. For example, the transverse positioning could be offset by 1 pixel or less, by up to 5 pixels, by up to 20 pixels, by up to 100 pixels, or larger amounts.
  • One image can be rotated with respect to the other by similar amounts.
  • the error in the parallelism of the optic axes can be ⁇ 1 microradian, ⁇ 20 microradians, ⁇ 100 microradians, ⁇ 1 milliradian, or larger amounts.
  • the position along the optic axes (the depth) can be adjusted, depending on the settings of each lens.
  • software or mathematical coordinate transformations can be used in conjunction with mechanical positioning to improve the accuracy of the alignment between the images or corresponding pixels on each sensor or set of sensors (or images obtained therefrom).
  • the settings and desired alignments are quite different.
  • the first and second cameras 24, 26 are positioned such that the IR light of the second (depth) camera 26 is reflected to the second (depth) camera 26, while the visible light (e.g., the light 32) is transmitted to a color or monochrome camera, in this case the first camera 24.
  • the timing of the shutters of the first and second cameras 24, 26 can be synchronized so that the correspondence between each image of each camera is known.
  • the spectral characteristics of the splitter optics can be reversed so the IR light (e.g., the nonvisible light 40) is transmitted to the depth camera (the second camera 26). Additional cameras can be added to the combination if desired by using suitable optical components to divide the beam into suitable paths.
  • An embodiment of the present invention would be to position the nonvisible light illumination source 20 to be pointed towards the scene 12, possibly with beam shaping optics to match one or the other of the fields of view of the first and second cameras 24, 26.
  • the illumination pattern can be aligned to match the field of view (“FOV”) of the second (depth) camera 26.
  • the timing of the nonvisible light illumination source 20 will then be synchronized to the timing of the second (depth) camera 26.
  • Another approach to reducing the effect of parallax is to use three or more cameras (not illustrated).
  • the two second (depth) cameras 26 may be placed symmetric about the first (color) camera 24.
  • the two second (depth) cameras 26 may be placed in close physical proximity to the first (color) camera 24.
  • the second optics 76 (e.g., the lens) of the second (depth) cameras 26 may be within 3 cm of the first optics 74 (e.g., the lens) of the first (color) camera 24, within 10 cm, within 20 cm, within 30 cm, more than 30 cm, more than 50 cm, more than 1 m, more than 2 m, or a larger separation distance.
  • the axes of the lenses (of the first and second optics 74, 76) may be roughly parallel or may converge to a determined point or region.
  • the optic axes of the three lenses (e.g., the first and second optics 74, 76) may lie in a single plane or may all lie in different planes as is desired for the use specifics.
  • the nonvisible light illumination source 20 may be provided by a single illumination source or may be the result of two or more illumination sources. If there is more than one illumination source, the illumination pattern between the sources may overlap, may be parallel, or may overlap partially, or be positioned to overlap minimally.
  • the timing and synchronization may be set such that the two second (depth) cameras 26 are approximately coincident in time, offset by a known value, or set to minimize any overlap (for example, to be offset by the length of the second (depth) camera 26 shutter length or between 1X and 2X the shutter length, or between 2X and 4X the shutter length.
  • Software may be used to further improve the depth solutions from the second (depth) cameras 26, each of which produces a 2D depth map (the second digital map 42) and other data.
  • the photogrammetric solution based on triangulation of the common surface location may be calculated separately from the intrinsic depth maps and then the two combined mathematically to improve the 3D location accuracy and precision.
  • one or the other solution may be used as a weighting factor or guide to improve the speed of the overall 3D location solutions.
  • other techniques may be used to combine multiple measurements of the same surface such that the resulting 3D location values are more accurate or more precise or can be effectively represented by a smaller size data set.
  • FIG. 10 Other embodiments may have three or more second (depth) cameras 26 placed as described above.
  • FIG. 10 In another contemplated variation of the optical systems 10, 70, 78 of the present invention, two or more first (color) cameras 24 may be employed. There may be a greater number of first (color) cameras 24 than second (depth) cameras 26. Alternatively, the placement of the first (color) camera(s) 24 with respect to the second (depth) camera(s) 26 as described above may be revered.
  • the output of any of these configurations can be used in a real-time workflow with any software calculations being done with a latency that may be ⁇ 1 ms, ⁇ 10 ms, ⁇ 50 ms, ⁇ 200 ms, ⁇ l sec, ⁇ 5 sec ⁇ 30 sec on appropriate computer hardware (for example, a CPU, a GPU, an FPGA, ASIC, ISP, DSP, or other similar compute platform, or combination of any of the above). Or the output can be used to save the data to disk or other storage option and used at a later time (e.g., the memory 58).
  • the relative timing between the first and second cameras 24, 26 may be set in a variety of ways.
  • the timing of the second (depth) cameras 26 can be synchronized to use the same illumination pattern or even a common illumination. They may be synchronized such that they operate such that the illumination and the camera shutter do not overlap.
  • the second (depth) cameras 26 may be timed to occur at the front of the other camera shutter, in the middle, at the end, or other arbitrary time position. These camera timings may be timed to be in sync with other tracking or LED display systems or to be out of sync to minimize interference as desired.
  • Registration may be employed to measure the relative location and pose of each camera 24, 26 (6 degrees of freedom or 6DOF).
  • This process may be performed at once for a configuration or may be updated frequently or constantly based on other information available.
  • the result of this process is a mathematical correlation between pixels of the two cameras 24, 26 or sensors that depends on lens characteristics, spatial orientation of the two cameras, and distance of the real surface and the camera 24, 26.
  • the relationship can be used to transform or map the depth map (e.g., the second digital map 42) onto an equivalent grid that corresponds to the pixels of the other camera, such as the first camera 24.
  • its inverse can be used to map the pixels of the other camera, such as the first (color) camera 24, onto the pixels or depth map from the 3D oTOF camera (the second (depth) camera 26).
  • 3D oTOF cameras e.g., second (depth) cameras 26
  • 3D oTOF cameras also provide information so this process works well over large volumes, for example, > 3 m across, > 5 m across, > 10 m across, > 15 m across, > 20 m across, > 30 m across, or larger.
  • Multiple camera volumetric capture & constrained photogrammetry
  • volumetric video or hologram images in some marketing material (even though such results are very different from a true hologram).
  • a special rig built with over 60 cameras was used to record images and calculate 3D location information in a single plane.
  • the process used for “The Matrix” is commonly referred to as “bullet time.”
  • Intel built a studio specifically for this purpose. The studio was 10,000 square feet in area and used 100 cameras placed around the perimeter. It required a super computer to operate and 1 week of processing for a 10 sec movie clip.
  • volumetric video capture using 3D cameras has been viewed as not useful with current solutions and approaches because of low or slow performance, limited resolution for important features, or limited operating distances of stage/scene volumes.
  • the capability described in the previous section can be expanded to include additional cameras in or around an area or volume to capture 3D location information of all surfaces in an area or volume.
  • the increased operating distances for oTOF can make it possible to create volumetric stages or areas that are for example, 3 m or more, or 5 m or more, or 10 m or more, or 50 m or more, or 100 m or more.
  • the increased resolution available with oTOF becomes important as well because the projected area of a pixel becomes larger with the square of the distance. Greater distances require more pixels or denser points to have adequate performance for a given object size, for example arms or legs or fingers or similarly sized non-human objects.
  • the first and second cameras 24, 26 are arranged around the perimeter of an area or volume. For some projects, the first and second cameras 24, 26 could be placed at locations with the volume to capture specific areas.
  • each camera 24, 26 may be placed so that there are minimal or no significant obscured surfaces for the locations and ranges of motions planned.
  • the cameras 24, 26 may be placed to concentrate the fields of view (“FOV”) close to a preferred plane through the volume, such as a horizontal plane at a particular height above the floor.
  • FOV fields of view
  • the cameras 24, 26 may be placed to approximately uniformly around a partial sphere around the volume of interest.
  • the cameras 24, 26 may be placed at different radii (or the equivalent) from an approximate center of the volume or approximate location of interest.
  • additional cameras 24, 26 can be used to increase the ability of the system to capture 3D location data for objects 14, 16, 18 that are closer together or there are more objects 14, 16, 18 in the scene 12.
  • object spacing that is ⁇ 2% of the area diameter, ⁇ 5%, ⁇ 10%, or ⁇ 20% and the number of objects occupying > 0.1% of the total horizontal area (or equivalent metric along any other plane through the volume), or > 0.5%, or > 1%, or > 2%, or > 5% of the area.
  • the total number of cameras 24, 26 required to achieve a desired level of 3D location point density, capture volume extent, and operating distance will be significantly lower than any current approach (e.g., any approach provided by the prior art).
  • the oTOF 3D camera systems e.g., the optical systems 10, 70, 78 of the present invention
  • the constellation of cameras 24, 26 provide different measurements from different locations and so photogrammetric 3D location solutions can also be calculated.
  • This photogrammetric calculation will be significantly faster than current photogrammetry solutions because the oTOF depth values provide starting values. These values may be used as weights or limits to speed the photogrammetric calculations.
  • the combination of the 3D location determination will increase the performance of the overall system, increasing volume working size or decreasing the minimum feature size that can be supported by any volumetric system that used oTOF cameras 24, 26.
  • Depth keying It is desirable to select elements of a scene 12 to display or composite together with other elements recorded separately or with computer generated (CG) elements.
  • the task of separating or segmenting such elements can be keying. It is traditionally done by manually rotoscoping the elements (tracing the outline of the object(s) of interest either physically in film or digitally) or using green or blue backgrounds (as illustrated in Fig.8). The color is then used as the key to determine foreground or background (known as chromakey). But it requires the extra cost of constructing the special sets or environments and the additional equipment and software to automate the segmentation process. There can also be extra cost and complexity to remove to green or blue tinge from the color images or to match the colors shot at different times or in different sets.
  • depth can be used to provide a key (“depth key”) to distinguish between objects to keep or objects to remove.
  • depth key can be used as a key in corresponding 2D images or other data/information in multiple ways. For example, a single depth value can be used. Any pixel in the 2D image with a corresponding depth value greater than a certain value can be assigned to background or assigned to be transparent or otherwise differentiated for later processing (for example, using the alpha channel in a computer display).
  • the depth or 3D location value can be compared to the plane or other geometric shape and assigned a keying or mask or matte value depending on if the 3D location is on one side of the geometric shape or another.
  • Fig.6 illustrates this.
  • the top image 98 shows a color image with associated depth or 3D location information (such as a mesh). The correspondence between these can be handled via UV coordinates (as known in the art of computer graphics).
  • the second image 100 in Fig.6 shows the same color image, but the 3D location information was used to compare with two planes placed slightly in front of the two walls in the color image. The color pixels with 3D data behind the planes are not displayed so that the walls disappear.
  • the third image 102 in Fig. 6 shows selected elements superposed in front of a new background.
  • the image in Fig.7 provides an image 104 that shows a more complex result where the two actors and chair have been keyed using depth location and planes and are displayed in a CG environment with a CG table and items on the table.
  • a real table (seen in the greyscale depth map in the inset) was keyed out using a series of parallelopiped shapes to segment the table top and table legs so that it is not displayed in the final image.
  • the real cups can then be placed on the real table but they look like they are resting on the digital table in the final output.
  • the software or hardware used to create the final product can vary, depending on the project needs. It can be done in real-time using fast solutions such as a game or render engine such as Unity or Unreal, or other custom built software. The information can be transmitted via computer network or save and read from computer files. The combination can be done using computer software such as Nuke or Maya or Houdini or other similar software for manipulating or combining image or 3D information. Background replacement [00182] One use of keying that is becoming more prevalent in recent years with the advent of the large LED walls that provide a controllable and manipulable background during recording or streaming of imagery and video. However, this displayed background may need to be corrected for a variety of reasons, which may be known before it is displayed or discovered after the fact.
  • Interlaced green frames can be displayed to provide a background for chromakeying the background from the foreground. This requires more expensive equipment and there can still be errors more moving objects. This can result in costly manual operations as well as require more time than is available.
  • the depth key described earlier can be used instead of requiring interleaved green frames.
  • a depth key may also be used regardless of the background or surroundings that were present during the shooting or recording. The key can be applied in real-time with low latency as illustrated by the figures in the present document, which are screen captures of the live display. Latencies below 100 ms have been achieved.
  • the depth key may be used so that lower cost, coarser pitch display segments can be used for a background display wall and the computer-generated (CG) background in the display virtual camera is just the original CG data and the LED wall provides primarily lighting.
  • the in-camera visual effects allowed by the LED walls may be done entirely with a depth key from 3D oTOF cameras (e.g., second cameras 26) and no physical display is needed.
  • the CG background that can be used from a depth key does not have to correspond to any LED wall size or the physical FOV of the real camera.
  • the backgrounds in the Figures in the previous section are entire 3D worlds of which only a small part is visible at one time in the display virtual camera. This amount displayed of a large CG background can be controlled via a computer control, simulating a zoom capability digitally. This background can be smaller than the physical camera FOV, similar in size, or larger than.
  • Real-time preview (“Simulcam+”) [00187] It is often desirable to be able to view some representation that will be close to the final output.
  • Optical or digital viewfinders provide that capability for traditional cameras and now, the digital camera output can often be viewed on a smartphone or tablet or computer or other remote viewing device.
  • computer-generated content is mixed in some fashion with live or real elements, this becomes difficult, especially in real-time.
  • it is difficult to display the mixed output such that real objects are behind the computer generated (CG) elements.
  • real objects must be placed in specific locations or measured carefully to determine which surfaces will be displayed.
  • a depth map (or other 3D location information of the pixel surfaces) (e.g., the second digital map 42) may be used to provide an improved preview solution.
  • the 3D surface location provides the necessary data to determine whether to display the CG or real surface anytime there is an overlap.
  • the location information also provides the necessary information to determine the relative and absolute scales of the CG and real elements.
  • CG skins may be placed on real objects and follow those objects during any image or video capture.
  • low latency ( ⁇ several frames lag from when the event occurred) depth data allows essentially all visual effects to be displayed in real-time or approximately real-time. This capability could be referred to as Simulcam+.
  • Prior depth capture solutions have been ineffective accomplishing this and could lead to the conclusion that this is not a desirable approach. They have been much too slow, much too coarse, or only effective at very short operating distances. However, the present invention uses much higher density depth maps that can be captured at any point in typical working volumes to achieve a viable level of performance.
  • Fig.8 provides an image 106 that shows two real actors and two real props being inserted directly into a CG environment and CG object inserted between the body and hands of one of the actors.
  • the real objects being displayed correctly in front of the CG elements. Shadows and other desirable effects are also cast correctly from digital lighting. Any other digital effect can be added to the display.
  • Timing and other parameters can be set as described above. Relighting, color grading, and shadow generation [00193] There are times when a project may desire to either change the color or lighting or shadows in the final output, different from what was recorded in a real set. It can happen that things change or the presence of CG elements make it important to adjust how the real objects appear. Or the real objects may impact the desired appearance of the CG objects.
  • Still images or other assist 2D cameras may provide info for photogrammetric location solutions or as guides for the manual process. But there is no robust solution today.
  • 3D oTOF cameras e,g, the second cameras 26
  • this 3D location data can be used either in real-time or during any post-recording work. For example, in Fig.9, the first image 108 shows a daylight scene in the shadow of CG or digital trees. The second image 110 shows a nighttime scene lit by candlelight.
  • Fig.8 shows the shadows created by the real objects in the CG environment. These shadows move based on ray-tracing or similar calculations as expected as the digital light source is moved.
  • the 3D location information is necessary for these calculations, whether as points with a small area associated with them or as a mesh of polygons. Other representations of the 3D location data may also be used. Similarly, CG objects cast shadows on the real objects, with the shadow shape determined by the location of the CG object and the shape or contour of the real object. High resolution 3D location data makes this possible at this level of fidelity and for large areas or scenes. [00199]
  • the oTOF 3D location data as provided by the second digital map 42, for example, also provides the necessary information to create the appropriate lighting or shadow look if additional lights are added to the scene. The lights could be standard light sources such as a light bulb or more exotic ones such as a glowing fairy or fireball or any other source of light.
  • the video rates for the high resolution 3D data make it possible to correctly apply the lighting effects even if the CG light is moving or the real object is moving.
  • Other projects may wish to change or adjust the lighting on real objects after the images or video are recorded or even live because of some creative desire.
  • the 3D data may be readily used in software to change the lighting (or color) characteristics of a region so that the distance- dependent effects of light look as they should. For example, a foreground light would brighten objects closer to the light more than objects in the background that are further from the light.
  • This process and result can be done more quickly than current solutions because of the resolution, range, and speed of 3D oTOF cameras.
  • Render software such as game engines or similar software provide mechanisms to check if different meshes collide or overlap.
  • Features like this may be combined with the 3D location information from 3D oTOF cameras (e.g., the second cameras 26) to create interactive effects.
  • the low latency, high speed nature of the 3D data along with the working range and high resolution are valuable to perform effects like this in real time.
  • the current solutions are very limited to well-rehearsed simple motions with specific objects and locations and often fail anyway. It is not practical to do such effects on a regular basis with current solutions.
  • For example if there is a CG element present in a displayed scene, it has an associated mesh of location data.
  • the 3D oTOF location data of real objects such as a paddle or actor can also be represented by a mesh of polygons or other similar representation.
  • the meshes of each are associated with the displayed image or texture via UV coordinates or similar technique (or colorized points can be used).
  • Mesh collision detection algorithms or similar approaches as appropriate can be used to detect when the two meshes begin to overlap.
  • Software can then be used to move the CG mesh according to defined rules (for example, move the CG mesh away from the real object mesh at the same speed as the real object mesh was moving). Or apply some physical force and response mathematical model. Or apply a random direction and velocity vector. Or apply an acceleration vector according to the length of the mesh overlap or real object mesh velocity vector or a weight or mass coefficient. Or other similar things.
  • CG object could disappear from view. Or it could catch fire. Or other types of effects.
  • various math or physical models may be applied to govern what happens to the CG mesh or even aspects of the real object when the two meshes begin to overlap or overlap according to some rules. The result is an ability for real objects to interact with CG objects in ways that may look realistic or fanciful, as desired.
  • Volumetric effects e.g. Digital smoke
  • High precision depth means things like smoke and fog obscure some objects and not others, based on optical path distances.
  • Fig.10 provides an image 112 that shows an example of digital fog coming from the background and the woman is obscured by the fog before the man in the foreground is.
  • CG objects could be chosen to be affected differently than real objects.
  • Specific objects may be chosen to be affected differently than the rest (for example, never blurred).
  • the effect can be applied to all objects (real or CG), objects between 2 m and 20 m, between 3 m and 30 m, between 2 m and 10 m, beyond 10 m, beyond 20 m, between 2 m and 5 m, between 5 m and 10 m.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

An optical system for generating a three dimensional, digital representation of a scene includes a nonvisible illumination source, a first camera to receive light from the scene and generate a first digital map therefrom, and a second camera to receive the nonvisible light from the scene, the nonvisible light having been generated by the illumination source, and to generate a second digital map therefrom. The first digital map concerns color and the second digital map concerns depth. The system also includes a processor that receives the first and second digital maps and generates a three dimensional, digital representation of a scene. The three dimensional, digital representation of the scene satisfies the following: (1) an image resolution with an image pixel count greater than or equal to 0.9 M (megapixels), (2) an image latency of 10 ms – 30 sec, and (3) a distance of 10 cm – 200 m.

Description

APPARATUS AND METHOD FOR REAL-TIME THREE-DIMENSIONAL IMAGING Cross-Reference to Related Patent Application(s) [0001] This application relies for priority on U.S. Provisional Patent Application Serial No. 63/429,135, filed on November 30, 2022, the entire content of which is incorporated herein by reference. Field of the Invention [0002] The present invention pertains to the field of three-dimensional imaging and various parameters associated therewith. Description of the Related Art [0003] As should be apparent to those skilled in the art, digital images are employed in numerous industries for various purposes. [0004] One industry that has become increasingly interested in and reliant on three dimensional digital images is the entertainment industry, particularly in the areas of live broadcasts, movie production, and gaming, among others. [0005] In these industries, there has developed a desire to combine computer-generated or digital elements with live or real elements or with objects in real time. There are many names used to describe the ability to combine computer-generated or digital elements with live or real elements or objects: augmented reality, mixed reality, visual effects, and virtual production, amongst others. [0006] However, in most cases, combining live elements and digital elements presents a number of difficulties and requires specialized tools and techniques and significant amounts of manual labor for the best results. [0007] For example, Microsoft has created the “Hololens” product. The “Hololens” displays computer generated content onto a transparent lens in front of a person’s face to create an overlay on the information that the human eye perceives in a scene being viewed. Heads-up displays accomplish similar objectives for simple, computer-generated elements. [0008] For film and TV productions, software tools allow computer-generated elements to be overlaid on top of, or behind, traditional images. These traditional software tools are unable to spatially intermix computer-generated elements with real elements, because they lack depth information. [0009] Still further, in some implementations, large display walls may be used to display computer generated content. The large wall display may then be filmed to provide a background for the actors. Display walls adequately display the background scene to film directors, but depth information is required for visualization of the entire scene, including the foreground and digital elements between the actors and the background display screen. [0010] Other approaches employ more complex solutions. For example, real action may be filmed in front of a blue or green screen, using color to separate the foreground from the background. This technique is called chromakeying. In this implementation, motion capture devices are able to track the basic motion of real objects. Software is used to couple the recorded motion with computer generated elements. Here, laser scanners may be employed to digitize the location of static objects. However, digitizing the location of static objects requires significant time to acquire the necessary data. [0011] Some depth or “RGBD” (red green blue depth) cameras have been employed to accomplish real-time or near-real-time results, but these attempts have been too limited in performance and are impractical outside of limited experiments. [0012] It has been unclear across many industries how to accomplish the mixing of reality with computer generated elements or even if such combinations are even useful. For example, Facebook has promoted a cartoon-looking “metaverse” where nothing looks like reality. Others have envisioned products where both real and computer-generated elements look real. [0013] Technology to capture reality in an acceptable way (including 3D data, at high- enough frame rates) is a large bottleneck to further progress. [0014] Better solutions are needed. Summary of the Invention [0015] The present invention provides a system and methods that address one or more deficiencies in the prior art. [0016] Generally speaking, the present invention employs high definition, three dimensional (“HD3D”) imaging to make it possible to effectively combine computer-generated and real elements, either in real-time or as part of a more extensive offline workflow. [0017] In one contemplated embodiment, the present invention provides an optical system for generating a three dimensional, digital representation of a scene. The optical system includes a nonvisible illumination source to generate and to illuminate the scene with nonvisible light. The optical system also includes a first camera and a second camera. The first camera is adapted to receive light from the scene and generate a first digital map therefrom. The first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels. Each first camera pixel of the plurality of first camera pixels is associated with a color. The second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the illumination source, and to generate a second digital map therefrom. The second digital map comprises a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels. Each second camera pixel of the plurality of second camera pixels is associated with a depth. The depth is determined as a function of optical time of flight information for the nonvisible light. The system also includes a processor connected to the first camera and to the second camera. The first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween. The processor receives the first digital map and the second digital map and combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene. The three dimensional, digital representation of the scene satisfies the following parameters. First, the three dimensional, digital representation of the scene has an image resolution with an image pixel count of greater than or equal to 900,000 pixels (or ≥ 0.9 megapixels, abbreviated as ≥ 0.9 M), wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene. Second, three dimensional, digital representation of the scene has an image latency of 10 ms – 30 sec, where the image latency comprises a processing time between receipt of the first and second digital maps by the processor and generation of the three dimensional, digital representation of the scene. The three dimensional, digital representation of the scene involves a distance of 10 cm – 200 m, where the distance is measured from the first and second cameras to an object in the scene. [0018] In another contemplated embodiment, the system also includes a display connected to the processor to display the three dimensional representation. [0019] Still further, the system of the present invention may incorporate a memory connected to the processor to save the three dimensional, digital representation of the scene. [0020] The system of the present invention also is contemplated to encompass instances where successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 frames per second (“fps”) – 250 fps. [0021] When the system of the present invention is employed for movie production, the three dimensional digital representation of the scene also satisfies at least one of the following parameters: (1) the image resolution has the image pixel count of greater than or equal to 0.9 M, (2) the distance is 10 cm – 50 m, (3) the frame rate is 23 – 250 fps. [0022] Where the system of the present invention is employed in pre-visualization (pre- viz) of scenes for movie production, the three dimensional, digital representation of the scene also satisfies an image latency is 10 ms – 10 s. [0023] When the system of the present invention is employed in the context of live broadcast events, the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency is 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 2 m – 200 m, and (4) a frame rate is 23 – 250 fps. [0024] When the system of the present invention is used for volumetric capture, the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency of 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 1 m – 50 m, and (4) a frame rate is 5 – 250 fps. [0025] When the system of the present invention is employed for static capture, the three dimensional, digital representation of the scene satisfies at least one of: (1) an image latency is 10 ms – 1 s, (2) an image resolution has the image pixel count of greater than or equal to 0.9 M, (3) a distance is 50 cm – 200 m, and (4) a frame rate is 5 – 250 fps. [0026] Still further, the first camera and the second camera may be housed within a single housing. [0027] If the first sensor and the second sensor are housed within a single housing, the single housing is contemplated to include a single optical element and a beam splitter after the optical element to direct the light to the first camera and the nonvisible light to the second camera. [0028] The system of the present invention may be constructed so that the first camera encompasses a plurality of first cameras. Moreover, the second camera may encompass a plurality of second cameras. [0029] Another contemplated embodiment of the present invention provides a system for generating a three dimensional, digital representation of a scene. The system includes a nonvisible light illumination source to generate and to illuminate the scene with nonvisible light, a first camera and a second camera. The first camera is adapted to receive light from the scene and generate a first digital map therefrom. The first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels. Each first camera pixel of the plurality of first camera pixels is associated with a color or with a grayscale. The second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the nonvisible light illumination source, and to generate a second digital map therefrom. The second digital map encompasses a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels. Each second camera pixel of the plurality of second camera pixels is associated with a depth, where the depth is determined as a function of optical time of flight information for the nonvisible light. The system also includes a processor connected to the first camera and to the second camera. The first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween. The processor receives the first digital map and the second digital map and combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene. The three dimensional, digital representation of the scene satisfies the following: (1) an image resolution with an image pixel count of greater than or equal to 0.9 M, wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene, (2) a distance of 10 cm – 200 m, wherein the distance is measured from the first and second cameras to an object in the scene, and (3 successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 fps – 250 fps. [0030] In this embodiment, the system may include a display connected to the processor to display the three dimensional representation. [0031] Here also, a memory may be connected to the processor to save the three dimensional, digital representation of the scene. [0032] In this embodiment, for movie production, the three dimensional digital representation of the scene satisfies at least one of: the distance is 10 cm – 50 m, and the frame rate is 23 – 250 fps. [0033] For volumetric capture, the three dimensional, digital representation of the scene satisfies at least one of: the distance is 1 m – 50 m, and the frame rate is 5 – 250 fps. [0034] For static capture, the three dimensional, digital representation of the scene satisfies at least one of: the distance is 50 cm – 200 m, and the frame rate is 5 – 250 fps. [0035] In this embodiment, the first camera and the second camera may be housed within a single housing. [0036] It is contemplated that, for this embodiment, the single housing may include a beam splitter to direct the light to the first camera and the nonvisible light to the second camera. [0037] As before, the first camera may encompass a plurality of first cameras and the second camera may encompass a plurality of second cameras. [0038] It is noted that the present invention is not limited solely to the features and aspects listed above. Still other features and aspects of the present invention will be made apparent by the discussion that follows. Brief Description of the Drawings [0039] The present invention will be described in connection with the drawings appended hereto, in which: [0040] Fig. 1 is a graphical representation of a first contemplated embodiment of the optical system of the present invention; [0041] Fig.2 is a graphical representation of a second embodiment of the optical system of the present invention; [0042] Fig.3 is a graphical representation of a third embodiment of the optical system of the present invention; [0043] Fig. 4 is a graphical representation of one contemplated manipulation of digital information to generate a three dimensional, digital representation of a scene; [0044] Fig.5 provides side-by-side comparisons between digital maps created by systems in the prior art with the optical system of the present invention; [0045] Figs. 6 - 11 provide various images illustrating aspects associated with the manipulation of a three dimensional, digital representation of a scene created according to the present invention. Detailed Description of Embodiment(s) of the Invention [0046] The present invention will now be described in connection with several examples and embodiments. The present invention should not be understood to be limited solely to the examples and embodiments discussed. To the contrary, the discussion of selected examples and embodiments is intended to underscore the breadth and scope of the present invention, without limitation. As should be apparent to those skilled in the art, variations and equivalents of the described examples and embodiments may be employed without departing from the scope of the present invention. [0047] In addition, aspects of the present invention will be discussed in connection with specific materials and/or components. Those materials and/or components are not intended to limit the scope of the present invention. As should be apparent to those skilled in the art, alternative materials and/or components may be employed without departing from the scope of the present invention. [0048] In the illustrations appended hereto, for convenience and brevity, the same reference numbers are used to refer to like features in the various examples and embodiments of the present invention. The use of the same reference numbers for the same or similar structures and features is not intended to convey that each element with the same reference number is identical to all other elements with the same reference number. To the contrary, the elements may vary from one embodiment to another without departing from the scope of the present invention. [0049] Still further, in the discussion that follows, the terms “first,” “second,” “third,” etc., may be used to refer to like elements. These terms are employed to distinguish like elements from similar examples of the same elements. For example, one fastener may be designated as a “first” fastener to differentiate that fastener from another fastener, which may be designated as a “second fastener.” The terms “first,” “second,” “third,” are not intended to convey any particular hierarchy between the elements so designated. [0050] It is noted that the use of “first,” “second,” and “third,” etc., is intended to follow common grammatical convention. As such, while a component may be designated as “first” in one instance, that same component may be referred to as “second, “third,” etc., in a separate instance. The use of “first,” “second,” and “third,” etc., therefore, is not intended to limit the present invention. [0051] Before discussing the present invention, it is noted that the following documents are incorporated herein by reference: U.S. Patent Nos. 8,471,895, 9,007,439, 10,218,962, 10,104,365, 10,437,082, 8,493,645, and 8,254,009. [0052] Fig. 1 is a graphical representation of an optical system 10 according to a first embodiment of the present invention. [0053] It is noted that the term “optical system” is employed, because the system of the present invention involves light and optics. Use of the word “optical” should not be understood to limit the scope of the present invention. As detailed in the pages that follow, the system involves components other than optical components. [0054] The optical system 10 combines several components that, together, generate a three dimensional, digital representation of a scene 12. In Fig. 1, the scene 12 encompasses three representative objects, a person 14, a train locomotive 16, and a tree 18. The optical system 10 of the present invention captures light reflected from the scene 12 and renders the three dimensional, digital representation from that light. [0055] The light received by the optical system 10 includes two components, as described in greater detail hereinbelow. First, the optical system 10 receives light from the scene 12. The light is used to generate a color map that is used to create the three dimensional, digital representation of the scene 12. It is noted that the light from the scene 12 may be provided by natural and/or artificial light sources. For example, the light from the scene 12 may combine natural sunlight together with lights provided, for example, from stage spotlights. Second, the optical system 10 receives nonvisible light from the scene 12. The nonvisible light is used to generate a depth map that is used, together with the color map, to generate the three dimensional, digital representation of the scene 12. [0056] Concerning the color map portion of the three dimensional, digital representation of the scene 12, it is noted that this color encompasses two varieties. First, the color map may encompass actual color, meaning red, blue, and green (“RGB”) components. Second, the color map may be a monochromatic color map, meaning that the color map encompasses a grayscale image. In the paragraphs that follow, reference to an RGB color map and reference to a grayscale digital map are used interchangeably. [0057] To generate the nonvisible light, the optical system 10 incorporates a nonvisible light illumination source 20. The nonvisible light illumination source 20 generates nonvisible light 22 that is used to illuminate the scene 12. The nonvisible light 22 may be any type of nonvisible light 22 from the electromagnetic spectrum. Nonvisible light 22 encompasses light that is outside of human visual perception. For the optical system 10 described herein, it is contemplated that the nonvisible light 22 is either infrared (“IR”) light or ultraviolet (“UV”) light. [0058] The optical system 10 includes a first camera 24 and a second camera 26. [0059] With reference to Fig.4, the first camera 24 is a color camera that incorporates a first sensor 28 comprising a plurality of first camera pixels 30. In operation, each first camera pixel in the plurality of first camera pixels 30 is associated with a color or a grayscale value. The first sensor 28 is capable of capturing light 32 and transforming the light into a first digital map 34. The first digital map 34 also is referred to herein as a color digital map. As noted above, the first digital map 34 may be an RGB map, or it may be a grayscale digital map. It is noted that each first camera pixel of the plurality of first camera pixels 30 is associated with a color at each azimuthal and longitudinal position in the two dimensional field of view of the first camera 24. [0060] With continued reference to Fig. 4, the second camera 26 is a depth camera that incorporates a second sensor 36 comprising a plurality of second camera pixels 38. In operation, each second camera pixel in the plurality of second camera pixels 38 is associated with a depth value. The second sensor 36 is capable of capturing the nonvisible light 40 reflected from the scene 12 and transforming the nonvisible light 40 into a second digital map 42. The second digital map 42 also is referred to herein as a depth digital map. It is noted that each second camera pixel of the plurality of second camera pixels 38 is associated with a depth to a surface, as well as the intensity of light reflected from that surface, at each azimuthal and longitudinal position in the two dimensional field of view of the second camera 26. [0061] Concerning the depth value, it is noted that the depth values are generated using optical time of flight (“oTOF”) information that is associated with the nonvisible light 40 received by the second sensor 36 from the scene 12. This is described in greater detail herein below. [0062] As also illustrated in Fig.4, the first digital map 34 and the second digital map 42 are inputted into the processor 46. The processor 46 combines the first digital map 34 and the second digital map 42 to create the three dimensional, digital representation 44 of the scene 12. The processor 46 should not be understood to be a single processor. The processor 46 may encompass a plurality of components (e.g., several processors) without departing from the scope of the present invention. Where there are multiple processors 46, the multiple processors 46 may be operated at different times, as discussed in greater detail below. [0063] The plurality of first camera pixels 30 in the first camera 24 generate the first pixel map 34 with a first resolution of about 100K – 400M. As should be apparent to those skilled in the art, this indicates that there are about 100K – 400M of the first camera pixels that comprise the plurality of first camera pixels 30. [0064] Similarly, the plurality of second camera pixels 38 in the second camera 26 generate the second pixel map 42 with a second resolution of about 100K – 400M. Therefore, this indicates that there are about 100K – 400M of the second camera pixels that comprise the plurality of second camera pixels 38. [0065] It is noted that the first resolution may be less than, equal to, or greater than the second resolution. In one contemplated embodiment, the first resolution is equal to the second resolution, but this is not intended to be limiting of the present invention. [0066] The first camera 24 and the second camera 26 are contemplated to operate synchronously. In particular, the cameras 24, 26 function so that, for each three dimensional, digital representation 44 generated, the first digital map 34 and the second digital map 42 have a scenic correlation therebetween. Here, it is desirable for the information recorded in the first digital map 34 to be at least nearly identical to the information recorded in the second digital map 42 – from the perspective of what action is happening within the scene 12. With a synchronous (or nearly synchronous) operation, the first digital map 34 and the second digital map 42 may be processed to create the three dimensional, digital representation 44 of the scene 12, because the information in the digital maps 34, 42 correlates with one another. From one perspective, it can be understood that the first digital map 34 is created nearly at the same time (or nearly the same time) as the second digital map 42, so that both digital maps 34, 42 capture effectively the same information from the scene 12. Looking at this from a slightly different perspective, it can be understood that the delay from generating the first digital map 34 to the generation of the second digital map 42 is less than about ½ (one half) of the frame length. [0067] With renewed reference to Fig. 1, the processor 46 in the optical system 10 is connected to the first camera 24 via a first communication link 48. The processor 46 connects to the second camera 26 via a second communication link 50. And, as shown, the processor 46 connects to the nonvisible light illumination source 20 via a third communication link 52. [0068] As also illustrated in Fig.1, the processor 46 may be connected to a display 54 via a fourth communication link 56. The processor 46 also may connect to a memory 58 via a fifth communication link 60. [0069] It is contemplated that the processor 46 may be of any type capable of executing instructions, in the form of software, to generate the three dimensional, digital representation 44 by combining first digital map 34 with the second digital map 42. Any suitable processor 46 may be employed for this purpose. [0070] It is noted that the processor 46, in this embodiment, is contemplated to generate instructions to the nonvisible light illumination source 20 to generate the nonvisible light 22 that is used to illuminate the scene 12. Alternatively, instructions may be issued to the nonvisible light illumination source 20 via other avenues that should be apparent to those skilled in the art. [0071] The three dimensional, digital representation 44 is contemplated to satisfy at least one of the following parameters. [0072] First, the three dimensional, digital representation 44 is contemplated to comprise an image resolution with an image pixel count of ≥ 0.9 M. The image pixel count is understood to be suitable for a final pixel presentation. More specifically, the image pixel count may be matched to a display pixel count associated with the display 54, but a matched pixel count is not required for implementation of the present invention. [0073] As should be apparent to those skilled in the art, standard, commercially available displays 54 have a display pixel count (or “pixel density”) that may be anywhere from 720p to 8K. At present, these resolutions include 720p (1280 x 720 pixels, a total of 921,600 pixels), 1080p (1920 x 1080 pixels, a total of 2,073,600 pixels), 1440p (2560 x 1440 pixels, a total of 3,686,400 pixels), 2K (2048 x 1080 pixels, a total of 2,211,840 pixels), 4K (3480 x 2160 pixels, a total of 7,516,800 pixels), 5K (5120 x 2880 pixels, a total of 14,745,600 pixels), and 8K (7860 x 4320 pixels, a total of 33,955,200 pixels). So that the three dimensional, digital representation 44 may be displayed readily on the display 54, it is contemplated that the image resolution of the three dimensional, digital representation 44 will match the display resolution of the display 54. Alternatively, the image resolution of the three dimensional, digital representation 44 may be higher or lower than that the display resolution. If so, appropriate corrections may be made for display of the three dimensional, digital representation 44 on the display 54, as should be apparent to those skilled in the art. [0074] Second, the three dimensional, digital representation 44 is contemplated to have an image latency of 10 ms – 30 sec. The image latency comprises a processing time between receipt of the first and second digital maps 34, 42 by the processor 46 and the generation of the three dimensional, digital representation 44 of the scene 12. [0075] For the present invention, it is noted that the image latency is not limited solely to a range of 10 ms – 30 sec. In the context of the present invention, the image latency may be defined as any range, defined by endpoints, in 10 ms increments, from 10 ms to 30 sec. For example, in one contemplated embodiment, the image latency range may be 500 ms – 1 sec. Another embodiment contemplates a range for the image latency from 10 ms – 1900 ms. In addition, the present invention also encompasses any specific image latency that is equal to each 10 ms endpoint from 10 ms to 30 sec. In this context, specific values for the image latency may be 10 ms, 20 ms, 30 ms, 40 ms, 50 ms, and 1010 ms, as representative examples. [0076] Third, the three dimensional, digital representation 44 is contemplated to incorporate depth information encompassing a distance of 10 cm – 200 m. The distance is measured from the first and second cameras 24, 26 to an object (e.g., the person 14, the locomotive 14, and/or the tree 18) in the scene 12. In other words, the objects in the scene 12 are contemplated to be 1 – 200 m from the first camera 24 and the second camera 26. [0077] For the present invention, it is noted that the distance is not limited solely to a range of 10 cm – 200 m. In the context of the present invention, the distance may be defined as any range, defined by endpoints, in 10 cm increments, from 10 cm to 200 m. For example, in one contemplated embodiment, the distance may be 10 cm – 5 m. Another embodiment contemplates a range for the distance from 20 m – 100 m. In addition, the present invention also encompasses any specific distance that is equal to each 10 cm endpoint from 10 cm to 200 m. In this context, specific values for the image latency may be 10 cm, 20 cm, 1 m, 5 m, 10 m, and 100 m, as representative examples. [0078] With renewed reference to Fig.1, a first object distance 62 is shown from the first camera 24 and from the second camera 26 to the person 12. A second object distance 64 designates a separation between the locomotive 16 and the first and second cameras 24, 26. Finally, a third object distance 66 identifies the separation between the tree 18 and the first and second cameras 24, 26. Each of these distances 62, 64, 66 are contemplated to lie within the distance range of 10 cm – 200 m. [0079] Fig. 1 also illustrates a camera separation distance 68. The camera separation distance 68 is a distance separating the first camera 24 from the second camera 26. [0080] As should be apparent from Fig.1, it is contemplated that the first camera 24 may be separated from the second camera 26 by a distance, identified as the camera separation distance 68. The camera separation distance is contemplated to fall within a range of about 1 cm – 2 m. Still further, it is contemplated that the first camera 24 and the second camera 26 may be combined into a single unit where the single camera detect both light 32 and nonvisible light 40. [0081] For the present invention, it is noted that the camera separation distance is not limited solely to a range of 1 cm – 2 m. In the context of the present invention, the camera separation distance may be defined as any range, defined by endpoints, in 1 cm increments, from 1 cm to 2 m. For example, in one contemplated embodiment, the camera separation distance may be 1 cm – 1 m. Another embodiment contemplates a range for the camera separation distance from 50 cm – 1.5 m. In addition, the present invention also encompasses any specific camera separation distance that is equal to each 1 cm endpoint from 1 cm to 2 m. In this context, specific values for the image latency may be 1 cm, 2 cm, 5 cm, 10 cm, and 1 m, as representative examples. [0082] In one embodiment of the present invention, as noted above, it is contemplated that the optical system 10 will include a display 54. As shown, the display 54 is contemplated to connect to the processor 46 via the fourth communication link 56. When provided, the display 54 displays the three dimensional, digital representation 44. It is noted that the display 54 may be omitted without departing from the scope of the present invention. [0083] As also shown in Fig.1, the optical system 10 may include a memory 58 connected to the processor 46 via the fifth communication link 60. The memory 58 is contemplated to satisfy one or more operating parameters. As should be apparent, the memory 58 provides a location where the three dimensional, digital representation 44 may be stored. Still further, the memory 58 may store the software for execution by the processor 46. It is noted that the memory 58 may be omitted without departing from the scope of the present invention. [0084] As noted above, in one embodiment of the present invention, it is contemplated that the display 54 is provided with a display resolution that matches the image resolution of the three dimensional, digital representation 44. However, this construction is not required to remain within the scope of the present invention. It is contemplated that, in other embodiments, the image resolution of the three dimensional, digital representation 44 may differ from the display resolution. [0085] To this point, the three dimensional, digital representation 44 of the scene 12 has been described as a single frame or still shot image of the scene 12. [0086] When successive ones of the three dimensional, digital representations 44 of the scene 12 are assembled sequentially, they form a video. The present invention contemplates that successive ones of the three dimensional, digital representations 44 of the scene 12 may be assembled sequentially by the processor 46 to generate the video. If so, the video is contemplated to have a frame rate of between about 5 frames per second (“fps”) –250 fps. [0087] For the present invention, it is noted that the frame rate is not limited solely to a range of 5 – 250 fps. In the context of the present invention, the frame rate may be defined as any range, defined by endpoints, in 5 fps increments, from 5 fps – 250 fps. For example, in one contemplated embodiment, the frame rate may be 5 fps – 50 fps. Another embodiment contemplates a range for the frame rate from 15 fps – 25 fps. In addition, the present invention also encompasses any specific camera separation distance that is equal to each 5 fps endpoint from 5 fps – 250 fps. In this context, specific values for the image latency may be 5 fps, 10 fps, 15 fps, 50 fps, and 100 fps, as representative examples. The present invention also is intended to encompass the standard frame rate of 24 fps. [0088] Fig. 2 provides a graphical representation of a second embodiment of an optical system 70 contemplated by the present invention. [0089] In this embodiment, the optical system 70 shares many of the features described in connection with the optical system 10 illustrated in Fig. 1. In this embodiment of the optical system 70, the nonvisible light illumination source 20, the first camera 24, and the second camera 26 are disposed within a first housing 72. The first housing 72 includes a first optical element 74 that directs the light 32 to the first camera 24. The first housing 72 also includes a second optical element 76 that directs the nonvisible light 40 to the second camera 26. The first and second optical elements 74, 76 may be lenses, for example. Alternatively, the first and second optical elements 74, 76 may be apertures in the first housing 72. Still further, the first and second optical elements 74, 76 may combine lenses and/or apertures as may be required and/or as desired. [0090] Fig.3 is a graphical representation of a third embodiment of an optical system 78 according to the present invention. [0091] In the optical system 78, the first camera 24 and the second camera 26 are enclosed in a second housing 80 such that the first camera 24 is positioned perpendicularly to the second camera 26. It is noted that the perpendicular disposition of the first camera 24 in relation to the second camera 26 is merely exemplary and is not limiting of the present invention. Any suitable spatial positioning of the first camera 24 in relation to the second camera 26 is contemplated to fall within the scope of the present invention. [0092] In this embodiment, the light 32 and the nonvisible light 40 enter through a third optic into the second housing 80. The third optic 82 may be a lens or a plurality of lenses. Alternatively, as before, the third optic 82 may be an aperture in the second housing 80. Still further, the third optic 82 may combine lenses and/or apertures as required and/or as desired. [0093] For the optical system 78, the light 32 and the nonvisible light 40 pass through a fourth optic 84. At the fourth optic 84, the light 32 and the nonvisible light 40 are split from one another. For this reason, the fourth optic 84 also is referred to as an optical splitter. In the illustrated embodiment, the optical splitter 84 permits the light 32 to pass therethrough to the first camera 24. The nonvisible light 40 is redirected, perpendicularly to the light 32, so that the nonvisible light 40 is directed to the second camera 26. [0094] In the illustrated embodiment, the fourth optic 84 is contemplated to be a an optical component referred to as a beam splitter. However, any other optical component that splits the light into the light 32 and the nonvisible light 40 may be employed without departing from the scope of the present invention. [0095] The optical system 70 and the optical system 78 are contemplated to operate in the same manner as discussed in connection with the optical system 10. [0096] For the present invention, the optical systems 10, 70, 78 are contemplated to operate in at least one of five contemplated configurations. The five configurations include, but are not limited to: (1) movie production, (2) “pre-viz” movie production, (3) live broadcast production, (4) volumetric capture, and (5) static capture. Each of these five configurations is discussed below. [0097] In the first configuration, the optical system 10, 70, 78 is contemplated to operate for movie production. [0098] Movie production involves the creation of a video that captures visual information from, for example, actors disposed within the scene 12. When the optical system 10, 70, 78 is employed in the context of movie production, the “raw data” of the action generated by the actors in the scene 12 is captured. That “raw data” may encompass, for example, actors performing in front of a blue or green screen, which is a blue or green background employed by those skilled in the art. When actors perform in front of a blue or green screen, background elements are inserted at a later date, during editing, for example. Alternatively, actors may perform in front of one or more light emitting diode (“LED”) walls (otherwise referred to as “light walls”) consisting of digitally-generated backgrounds. Both approaches are commonly found in Virtual Production for movies and series features, for cinematic, TV, or personal device viewing. [0099] To operate for movie production, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has an image pixel count of greater than or equal to 0.9 megapixels (“M”) (“0.9 M”), which refers to the total number of pixels in the three dimensional, digital representation 44. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 10 cm – 50 m. Third, the frame rate of the video is between about 23 – 250 fps. [00100] Movie production also encompasses post-production processing to create a final version of a movie that has been prepared for projection to an audience. Post-production editing and manipulation of the action captured during the movie production phase can take days, weeks, months, or even years to generate the three dimensional, digital representation 44. Here, latency is not a parameter. In addition, the processor 46 may encompass several processors that are operated at various times during post-production. [00101] In a second configuration, the optical system 10, 70, 78 is contemplated to operate for “pre-viz” movie production. [00102] “Pre-viz” movie production differs from movie production in that the optical system 10, 70, 78 incorporates the display 54 and the three dimensional, digital representation 44 incorporates at least a rough, rendered environment in which the actors are inserted. The term “pre-viz,” which stands for “pre-visualization”), indicates that the information generated by the optical system 10, 70, 78 is after the movie production stage but before any editing and production stage where final environments are rendered. [00103] A pre-viz video is contemplated to be displayed to a movie director and/or producer immediately after the movie production so that the producer and/or director may judge if the actors have performed in a satisfactory manner. Effectively, “pre-viz” movie production may be understood as a preview of the final movie, after editing is completed. [00104] To operate for pre-viz movie production, it is contemplated that the optic system 10, 70, 78 will operate according to the same parameters identified for movie production. For pre- viz movie production, the optical system 10, 70, 78 also is contemplated to satisfy the image latency that is between about 10 ms – 10 s. [00105] In a third configuration, the optical system 10, 70, 78 is contemplated to operate for live broadcast production. This differs from movie production and pre-viz movie production in that the action is broadcast live, such as would be expected for a live sporting event or musical concert, for example. While there is contemplated to be a delay from capture of the live images to broadcast of the live images to viewers, the delay is short. [00106] To operate for live broadcast production, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 2 m – 200 m. Third, the frame rate of the video is between about 23 – 250 fps. Fourth, the latency is between about 10 ms – 1 s. [00107] In a fourth configuration, the optical system 10, 70, 78 is contemplated to operate for volumetric capture. [00108] Volumetric capture refers to the capture of an image and/or video within a defined spatial volume, such as on a sound stage. Volumetric capture is employed in instances where the elements of the scene 12 and the actors are disposed within a predefined space. Volumetric capture may involve generating three dimensional, digital representations 44 from multiple view points and/or multiple angles with respect to the volumetric space. [00109] For volumetric capture, at least two images, taken from different viewpoints are required to generate the three dimensional, digital representation 44. [00110] To operate for volumetric capture, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 1 m – 50 m. Third, the frame rate of the video is between about 5 – 250 fps. Fourth, the latency is between about 10 ms – 1 s. [00111] In a variation of volumetric capture, which involves post-production processing, the three dimensional, digital representation 44 may take hours, days, weeks, months, or years to create, depending on the amount of post-production processing required and/or desired. As such, for this variation, latency is not a parameter. Accordingly, in this post-production static camera capture environment, the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 1 m – 50 m. Third, the frame rate of the video is between about 5 – 250 fps. [00112] In a fifth configuration, the optical system 10, 70, 78 is contemplated to operate for static capture. Static capture refers to the capture of an image and/or video from a single viewpoint, such as would exist if a person were to take a picture using his or her cell phone camera, for example. It is noted that static capture encompasses two separate conditions. In a first condition, referred to as “static scene capture,” the camera 24, 26 moves in relation to the scene 12 and the objects 14, 16, 18 in the scene 12 are stationary. In a second condition, referred to as “static camera capture,” the camera 24, 26 is stationary with respect to the scene 12 and the objects 14, 16, 18 move within the scene 12. Both conditions are intended to be encompasses by use of the term “static capture.” [00113] For both static scene capture and static camera capture, it is contemplated that the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 50 cm – 200 m. Third, the frame rate of the video is between about 5 – 250 fps. Fourth, the latency is between about 10 ms – 1 s. [00114] In a variation of static camera capture, which involves post-production processing, the three dimensional, digital representation 44 may take hours, days, weeks, months, or years to create, depending on the amount of post-production processing required and/or desired. As such, for this variation, latency is not a parameter. Accordingly, in this post-production static camera capture environment, the optic system 10, 70, 78 will operate according to the following parameters. First, the image resolution has the image pixel count of greater than or equal to 0.9 M. Second, the distance from the first and second cameras 24, 26 to the scene 12 is between about 50 cm – 200 m. Third, the frame rate of the video is between about 5 – 250 fps. [00115] To assist with a more in-depth understanding of the optical systems 10, 70, 78 discussed in connection with Figs.1 – 4, the following additional information is provided. This information is intended to be applicable to each of the optical systems 10, 70, 78. [00116] The optical systems 10, 70, 78 of the present invention encompass systems and methods that permit the generation of high resolution images of scenes, including wide field of view scenes. Specifically, the systems and methods are contemplated to record, simultaneously, three dimensional position information for multiple objects in a scene with high spatial and distance resolution, along with intensity (grey-scale or color) information about the scene. Both the color and intensity information is recorded for every pixel in an array of pixels for each image. The intensity and position information are combined into a single three-dimensional image that approximates a human view of the scene. This is referred to, herein, as the three dimensional, digital representation 44. [00117] The availability of high density depth data can be used to accomplish many things that are now done either manually or with many complex steps and physical scenes. This depth information can be in the form of a 2D depth map, which provides a distance value to the surface imaged by that pixel, or it can be in the form of a more extensive 3D representation of the location of the surfaces present in an area or scene, such as a point cloud or surface mesh or voxel grid or similar ways of representing such 3D location information. Eventually such representations can also account for how such location information changes over time. General description [00118] Ways to capture reality in 3D have been known almost from the beginnings of photography. [00119] There are many ways in which people have attempted to determine the 3D location of surfaces or segments of a surface. [00120] Photogrammetry is an example of a broader class of approaches (stereo, structured light, structure from motion) that use geometry to determine the location of points, objects, or surfaces. Photogrammetry is akin to triangulation in navigation. Microsoft, Apple, and Intel as well as others have created products based on this type of approach, with limited performance. [00121] It is known that geometry-based approaches are limited in operating range, usually a few meters. Geometry-based approaches also require substantial computational power to correctly compute the location solution for pixel densities relevant for imaging, for example, > 100,000 points. Such products have been used to attempt to capture 3D information for many applications and use cases from robotics to motion pictures. However, the performance of these products has been so poor that many have concluded that such 3D cameras or capture devices are not relevant for these use cases. [00122] The other traditional approach to 3D capture is to use electronic means to detect when illumination light makes it back to the camera or sensor. This is usually called time-of-flight (“TOF”) and has many subcategories such as direct TOF, indirect TOF, phase-based TOF, frequency modulated continuous wave (“FMCW”), amplitude modulated continuous wave (“AMCW”), linear-mode and Geiger-mode avalanche photodiodes (“APDs”), single photon avalanche diode (“SPAD”), among others. All of these approaches have, in common, an electronic detector of some fashion that measures a phase change or arrival time of the returned light. [00123] It is known, for example, that single point scanners are able to achieve very high accuracy and point density over long ranges up to 1 km or more. However, single point scanners take minutes to hours to collect a reasonable number of points. [00124] Multi-point scanners typically have poorer accuracy and poorer point density than single point scanners. Moreover, multi-point scanners still take many seconds to collect a reasonable number of points. [00125] Imaging arrays that can capture image-like depth data have been limited to short range, e.g., < a few meters, or very low point density, e.g., < 20,000 points or pixels, or both. Those imaging arrays that are able to achieve longer ranges also suffer from very high costs, e.g., $10,000 or more. Imaging arrays have been tried for many use cases, but, with the exception of scanning of static areas where high cost is acceptable, the performance is so poor that adoption has not occurred, and many experts have concluded that such technologies are not relevant for those use cases. [00126] The final class of 3D capture technologies is optically based, where the properties of light itself are used to determine changes in distance. Traditionally, these 3D capture technologies have included interferometric techniques and coherent holography. Such systems are extremely costly (e.g., $100K and up). While these technologies are very precise, they do not work well outside laboratories and are limited in operating range and/or point density. They are not compatible with the broader commercial use cases described here. [00127] As detailed herein, the optical systems 10, 70, 78 of the present invention offer a new approach to solve the 3D capture problem in a practical way that can enable the use of 3D information to enhance and improve these use cases. Some of the key elements that are required to achieve improvements in these use cases are: resolution, operating range (e.g., distance), and speed (in terms of latency and speed of capture). [00128] As noted above, the optical systems 20, 70, 78 of the present invention rely on an all-optical TOF (“oTOF”) approach to the capture of information used to generate the second digital map. [00129] In one contemplated embodiment, oTOF uses an external modulation device in front of a normal sensor (e.g., the second camera 26) coupled with a pulsed illumination (e.g., by the nonvisible light illumination source 20) to create a modulated image and a non-modulated or reference image. The ratio of these two images, multiplied by the appropriate factor, is a direct measurement of the time when the illumination light returns to the camera and is achieved without measuring the time using any electronic means (e.g., the second digital map). The result of the combination of these variables is a device that can achieve all three key elements at the same time in a single device, e.g., the optical system 10, 70, 78. An added bonus is that the technology scales in cost as any 2D camera and so can achieve costs that are relevant for almost any industry or use case. Resolution [00130] The lateral or transverse resolution or point density of the 3D points is related to the feature size that can be identified or manipulated or measured. Higher density, just as in a normal 2D image, means finer and finer detail can be acquired and/or used. [00131] While an estimated 20,000 pixel or 100,000 pixel image may have been useful in the 1800s or early 1900s, today’s applications all expect something close to HD, which was defined to be close to human vision. It is also useful to note that upscaling and heavy filtering that some 3D camera products do to achieve a published specification of 300,000 points or even 1,000,000 points do not achieve the performance needed. [00132] For example, Fig.5 illustrates some comparisons of the images 86, 88, 90 produced by current (prior art) devices (left column) and the optical systems 10, 70, 78 of the present invention (right column), images 92, 94, 96. [00133] With reference to the top row of images, a depth map (an image with greys representing distance for each point) from a Microsoft Kinect Azure that reaches up to 5 m is compared with a 700,000 point oTOF depth map (according to the optical systems 10, 70, 78 of the present invention) that reaches 10 m. Image 86 is representative of the Microsoft Kinect Azure approach. The real resolution (e.g., the size of features that can be readily detected or identified) of the optical system 10, 70, 78 is much higher than the prior art, as shown. For example, the fingers, cap bill, and spoke in the bench wagon wheels are clearly visible in the 3D information, e.g., the second digital map produced by the optical system 10, 70, 78. This is the image 92. [00134] With continued reference to Fig. 5, the second row of images compares a typical (prior art) point cloud (the collection of 3D points rendered in 3D coordinates) from a photogrammetric solution at ~50 m scale (left side, image 88) to a point cloud created from an oTOF video feed that reaches 20 m outdoors according to the present invention (right side, image 94). As is apparent, the image 94 generated according to the optical system 10, 70, 78 of the present invention shows branches and leaves even at these long distances. [00135] It is known that high cost precision laser scanners can produce a resolution that is similar to the resolution of the oTOF point cloud, e.g., the three dimensional, digital representation 44 of the present invention. However, such laser scanners take much longer times to produce an output than the optical systems 10, 70, 78. [00136] Again, with reference to Fig. 5, the bottom row of images compares a depth map from a multi-point LIDAR scanner (prior art, left image 90) with a 700,000 point oTOF depth map (e.g., the three dimensional, digital representation 44) (invention, right image 96). The colors in these images represent the distance at each point or pixel. In the image on the right, the tricycle is approximately 30 m from the camera or capture device. As should be apparent from these three comparisons, the optical system 10, 70, 78 of the present invention produces a three dimensional, digital representation 44 with improved resolution or point density, which is evidenced by the ability to identify features or objects, as is important in the present invention. [00137] As noted above, for some projects, it may be desirable to obtain a depth map pixel count that is greater than about 100,000 pts/pixels (with a spatial frequency performance appropriate for such a pixel count instead of the Kinect prior art device). In other projects, a depth map of greater than about 300,000 points/pixels (VGA equivalent), or greater than about 500,000 points/pixels, or greater than about 700,000 points/pixels, or greater than about 1,000,000 points/pixels may be desired. Still further, it may be desirable to produce a format approximately equivalent to a 480p image, a 720 p image, a 1080p image, a 2K image, a 4K image, or an 8K image. Still other formats between these formats or larger may be desired. The optical system 10, 70, 78 of the present invention is capable of achieving these objectives, unlike the prior art. Range [00138] The range provided by the optical system 10, 70, 78 of the present invention also improves over the prior art. The range also is discussed herein as the distances 62, 64, 66 from the first and second cameras 24, 26 to objects 14, 16, 18 in the scene 12. The differences in operating range can also be compared with reference to Fig.5. [00139] The Microsoft Kinect system at the top row, left side, cannot achieve the distances/ranges of 20 – 30 m. Even at 5 m, there are many missing points, which prevent the product from being a solution for many of the use cases described herein. [00140] The use cases described below, when in an indoor set, may have objects that are within 1 m of the camera or within 3 m of the camera or they may be placed further away. For example, the objects may be placed at distances of > 3 m from the first and second cameras 24, 26, > 10 m, > 20 m, > 30 m, or, in some cases, > 100 m. On some sets or in some projects, the objects may be static or moving or may have video displays with moving or static images. These objects may be located between 1 m and 10 m from the first and second cameras 24, 26 or from 3 m to 30 m from the first and second cameras 24, 26, or 1 m to 30 m or between 2 m and 20 m, or other ranges of locations as may suit the needs. At times the objects may move outside these ranges/distances. In uses that are outdoors, the objects may be located in similar fashion as indoors. They may be also located between 10 m and 100 m from the first and second cameras 24, 26, between 1 m and 100 m from the first and second cameras 24, 26, between 10 m and 50 m, between 3 m and 50 m, or other location ranges as may suit. The optical system 10, 70, 78 of the present invention is capable of accommodating these distances. Furthermore, depth cameras such as the Kinect camera exhibit degraded range and accuracy in outdoor scenes. The optical system 10, 70, 78 of the present invention is capable of accommodating outdoor scenes, or indoor scenes with lighting that varies over time. Speed/Latency [00141] Speed refers to how long it takes to acquire and use the 3D data of interest. For example, how long does it take to acquire the equivalent of a frame of depth data (or a depth map or a point cloud). It refers to the latency or lag between the physical event or time to when the 3D data is available to be used in the workflow. It refers to how fast such 3D data are available in an ongoing fashion (e.g., the frame rate). In the context of the optical system 10, 70, 78 of the present invention, the latency encompasses the time between when the images are captured by the first and second cameras 24, 26 and the generation of the three dimensional, digital representation 44 by the processor 46. [00142] It is noted that, in large studio photogrammetry solutions (such as those built by Canon, Microsoft, or Intel), it can take hours to days to do all the calculations necessary to compute 1 second of 3D data. Smaller, multi-camera volumetric capture solutions can take days to calculate the 3D data. Using prior art devices, such large time scales are not practical for these use cases and are too costly. [00143] With renewed reference to Fig.5, it is noted that, for the point cloud image in Fig. 5, the second row, right-hand image, it is possible that prior art precision laser scanners can achieve the resolution approximating the resolution of the present invention. However, to do so, precision laser scanners known in the prior art require minutes to tens of minutes per scan to collect. Moreover, multiple scans typically are needed to avoid shadows or obscurations. In other words, the latency of the prior art is simply prohibitive of generating a three dimensional, digital representation akin to that of the present invention. [00144] For purposes of the present invention, the optical system 10, 70, 78 operates with latencies of 5 sec of less, latencies of < 1 sec, < 200 ms, < 100 ms, or < 50 ms (or the equivalent in number of frames or other metric). In other contemplated embodiments of the optical system 10, 70, 78 of the present invention, latencies may be < 10 minutes, < 5 minutes, < 1 min, or < 30 seconds. [00145] The present invention also is contemplated to be suitable for projects that require frame rates of 10 fps or higher, including frame rates of > 1 fps, > 20 fps, approx.24 fps, approx. 30 fps, > 30 fps, approximately 48 fps, approximately 60 fps, > 90 fps, approximately 98 or 100 fps, approximately 120 fps, or other frame rates as may suit. These frames may need to be synchronized with other systems or cameras. Grid or mesh generation [00146] As may be apparent from the foregoing discussion, the second camera 26 of the optical system 10, 70, 78 of the present invention initially creates a 2D array of distance measurements that corresponds to each pixel of the image sensor(s) used in the 3D camera. This depth map (i.e., the second digital map 42) may be used as is in some applications, such as projects where the information may be displayed on a 2D monitor or display using a virtual camera or point of view. [00147] For other uses, the points of the depth map (e.g., the second digital map 42) may be used as vertices to create a mesh of polygons, for example, triangles or quadrangles by connecting the points. The manner these points are connected may be simple or maybe more sophisticated where similar polygons are combined into larger polygons that represent the surface location over a large area if there isn’t much distance change in that area. [00148] In some cases, the depth map (the second digital map 42) may be used to calculate other vertices that are attached or part of a fixed grid or pattern in a volume of space, such as a voxel grid. The method of such calculation may be based on a confidence weighting factor for the distances in the depth map. [00149] For oTOF cameras, such as the second camera 26, the intensity of the IR is recorded or provided as an intensity map or IR image. This comes from the same array of pixels and so each IR pixel corresponds to a depth value of the depth map. This information can be used to colorize or associate a monochrome or color value to each point or vertex. Alternatively, texture coordinates or other equivalent representations may be generated which provide a correlation between the 3D mesh and the portion of the image or texture that corresponds to that 3D surface. The texture and 3D mesh can then be rendered in appropriate software such game engines. Combination with RGB or other 2D cameras [00150] In some cases, it is desirable to have color or other 2D imagery (e.g., the first digital map 34) or other information coordinated with 3D data (e.g., the second digital map 42) from 3D oTOF cameras (e.g., the second camera 26). Figs 1 and 2 illustrate examples of how the first and second cameras 24, 26 may be oriented next to each other, separated physically by some distance. As should be apparent from the foregoing discussion, the word “camera” may refer to a separately housed and lensed system or modules integrated within a common housing or even sensors that share optical systems in an appropriate fashion as described elsewhere. Obscura In Fill [00151] An issue that arises when combining imagery from multiple sources is parallax as illustrated in Figs. 1 and 2. This parallax results in the location of each pixel’s field of view (“iFOV”) being different between the first and second cameras 24, 26 and/or the images produced thereby (e.g., the first digital map 34 and the second digital map 42), the difference being dependent on the separation between the first and second cameras 24, 26 and the distance 62, 64, 66 that object 14, 16, 18 is from the first and second cameras 24, 26. Another impact is that there will be surfaces behind the foremost objects 14, 16, 18 that are blocked from the view of one or more of the first and second cameras 24, 26. For example, if the first and second cameras 24, 26 are offset vertically, the bottom camera will not see an area on a background object above the foremost object. [00152] There are different arrangements that can be used to reduce or eliminate the impact of these effects. [00153] The first is to position the optic axes of the two imaging lenses (e.g., the first optic 74 and the second optic 76) such that they are collinear. A way to achieve this alignment is illustrated in Fig. 3, for example. As shown in Fig. 3, the first and second cameras 24, 26 are positioned mechanically in 6 degrees of freedom (“DOF”). The precision of this alignment can vary, dependent on the requirements of the particular use. For example, the transverse positioning could be offset by 1 pixel or less, by up to 5 pixels, by up to 20 pixels, by up to 100 pixels, or larger amounts. One image can be rotated with respect to the other by similar amounts. The error in the parallelism of the optic axes can be < 1 microradian, < 20 microradians, < 100 microradians, < 1 milliradian, or larger amounts. The position along the optic axes (the depth) can be adjusted, depending on the settings of each lens. In addition, software or mathematical coordinate transformations can be used in conjunction with mechanical positioning to improve the accuracy of the alignment between the images or corresponding pixels on each sensor or set of sensors (or images obtained therefrom). [00154] While similar to the mechanical positioning of a stereo camera pair (such as illustrated in Figs. 1 and 2), the settings and desired alignments are quite different. For stereo cameras, it is required to offset the first and second cameras 24, 26 to duplicate the ocular spacing of the human viewer as well as adjust the convergence of the first and second cameras 24, 26 to a desired distance from the first and second cameras 24, 26. These alignments are not required in this case. [00155] As discussed in connection with Fig. 3, the first and second cameras 24, 26 are positioned such that the IR light of the second (depth) camera 26 is reflected to the second (depth) camera 26, while the visible light (e.g., the light 32) is transmitted to a color or monochrome camera, in this case the first camera 24. The timing of the shutters of the first and second cameras 24, 26 can be synchronized so that the correspondence between each image of each camera is known. The spectral characteristics of the splitter optics (e.g., the fourth optic 84) can be reversed so the IR light (e.g., the nonvisible light 40) is transmitted to the depth camera (the second camera 26). Additional cameras can be added to the combination if desired by using suitable optical components to divide the beam into suitable paths. [00156] An embodiment of the present invention would be to position the nonvisible light illumination source 20 to be pointed towards the scene 12, possibly with beam shaping optics to match one or the other of the fields of view of the first and second cameras 24, 26. In particular, the illumination pattern can be aligned to match the field of view (“FOV”) of the second (depth) camera 26. The timing of the nonvisible light illumination source 20 will then be synchronized to the timing of the second (depth) camera 26. [00157] Another approach to reducing the effect of parallax is to use three or more cameras (not illustrated). In an embodiment using three or more cameras, it is contemplated to employ two second (depth) cameras 26 that are placed on either side of a first (color or monochrome) camera 24. The two second (depth) cameras 26 may be placed symmetric about the first (color) camera 24. In addition, the two second (depth) cameras 26 may be placed in close physical proximity to the first (color) camera 24. For example, the second optics 76 (e.g., the lens) of the second (depth) cameras 26 may be within 3 cm of the first optics 74 (e.g., the lens) of the first (color) camera 24, within 10 cm, within 20 cm, within 30 cm, more than 30 cm, more than 50 cm, more than 1 m, more than 2 m, or a larger separation distance. The axes of the lenses (of the first and second optics 74, 76) may be roughly parallel or may converge to a determined point or region. The optic axes of the three lenses (e.g., the first and second optics 74, 76) may lie in a single plane or may all lie in different planes as is desired for the use specifics. They may be placed to the second (depth) cameras 26 are not symmetric about the first (color) camera 24. [00158] As illustrated in Figs.1 and 2, the nonvisible light illumination source 20 may be provided by a single illumination source or may be the result of two or more illumination sources. If there is more than one illumination source, the illumination pattern between the sources may overlap, may be parallel, or may overlap partially, or be positioned to overlap minimally. The timing and synchronization may be set such that the two second (depth) cameras 26 are approximately coincident in time, offset by a known value, or set to minimize any overlap (for example, to be offset by the length of the second (depth) camera 26 shutter length or between 1X and 2X the shutter length, or between 2X and 4X the shutter length. Other intervals between the shutters or illumination as desired. [00159] Software may be used to further improve the depth solutions from the second (depth) cameras 26, each of which produces a 2D depth map (the second digital map 42) and other data. For example, the photogrammetric solution based on triangulation of the common surface location may be calculated separately from the intrinsic depth maps and then the two combined mathematically to improve the 3D location accuracy and precision. Or one or the other solution may be used as a weighting factor or guide to improve the speed of the overall 3D location solutions. In addition, other techniques may be used to combine multiple measurements of the same surface such that the resulting 3D location values are more accurate or more precise or can be effectively represented by a smaller size data set. [00160] Other embodiments may have three or more second (depth) cameras 26 placed as described above. [00161] In another contemplated variation of the optical systems 10, 70, 78 of the present invention, two or more first (color) cameras 24 may be employed. There may be a greater number of first (color) cameras 24 than second (depth) cameras 26. Alternatively, the placement of the first (color) camera(s) 24 with respect to the second (depth) camera(s) 26 as described above may be revered. [00162] The output of any of these configurations can be used in a real-time workflow with any software calculations being done with a latency that may be < 1 ms, < 10 ms, < 50 ms, < 200 ms, < l sec, < 5 sec < 30 sec on appropriate computer hardware (for example, a CPU, a GPU, an FPGA, ASIC, ISP, DSP, or other similar compute platform, or combination of any of the above). Or the output can be used to save the data to disk or other storage option and used at a later time (e.g., the memory 58). [00163] For all of these scenarios, the relative timing between the first and second cameras 24, 26 may be set in a variety of ways. The timing of the second (depth) cameras 26 can be synchronized to use the same illumination pattern or even a common illumination. They may be synchronized such that they operate such that the illumination and the camera shutter do not overlap. The second (depth) cameras 26 may be timed to occur at the front of the other camera shutter, in the middle, at the end, or other arbitrary time position. These camera timings may be timed to be in sync with other tracking or LED display systems or to be out of sync to minimize interference as desired. Registration [00164] When more than one first (color) camera 24 and second (depth) camera 26 is used, a process may be employed to measure the relative location and pose of each camera 24, 26 (6 degrees of freedom or 6DOF). This process may be performed at once for a configuration or may be updated frequently or constantly based on other information available. [00165] The result of this process is a mathematical correlation between pixels of the two cameras 24, 26 or sensors that depends on lens characteristics, spatial orientation of the two cameras, and distance of the real surface and the camera 24, 26. [00166] The relationship can be used to transform or map the depth map (e.g., the second digital map 42) onto an equivalent grid that corresponds to the pixels of the other camera, such as the first camera 24. Or its inverse can be used to map the pixels of the other camera, such as the first (color) camera 24, onto the pixels or depth map from the 3D oTOF camera (the second (depth) camera 26). The presence of high point density depth measurements over large ranges makes this process more robust and more accurate than existing products or solutions. 3D oTOF cameras (e.g., second (depth) cameras 26) also provide information so this process works well over large volumes, for example, > 3 m across, > 5 m across, > 10 m across, > 15 m across, > 20 m across, > 30 m across, or larger. Multiple camera (volumetric capture & constrained photogrammetry) [00167] The concept of using multiple cameras 24, 26 to acquire more information about a scene 12 or action within a scene 12 can be expanded to being able ultimately capture all surface location information about any object 14, 16, 18 or surface in an area or volume. The final product of such an endeavor is often referred to as volumetric video or hologram images in some marketing material (even though such results are very different from a true hologram). [00168] However, it is difficult to achieve this via the prior art. For example, in the movie “The Matrix,” a special rig built with over 60 cameras was used to record images and calculate 3D location information in a single plane. The process used for “The Matrix” is commonly referred to as “bullet time.” Intel built a studio specifically for this purpose. The studio was 10,000 square feet in area and used 100 cameras placed around the perimeter. It required a super computer to operate and 1 week of processing for a 10 sec movie clip. [00169] Other large companies such as Microsoft and Canon have built similar studios to create volumetric video or 3D information with color. [00170] Other smaller solutions have been built and are being used, but they all involved using approximately 100 – 200 cameras and then processing them over hours to days to get short clips. These smaller setups limit any scene of interest to 1 – 2 m in diameter—a single person or perhaps two. Attempts to use 3D depth cameras to improve the performance or reduce the number of cameras needed have not worked well. Very small single person volumes have been shown with poor or erratic results because of the limitations in range and resolution of any video-oriented 3D camera solution today. Volumetric video capture using 3D cameras has been viewed as not useful with current solutions and approaches because of low or slow performance, limited resolution for important features, or limited operating distances of stage/scene volumes. [00171] The higher resolutions, ranges, and appropriate speed of capture from oTOF 3D cameras, such as the cameras 24, 26 of the present invention, now make it possible to develop practical volumetric capture solutions. The capability described in the previous section can be expanded to include additional cameras in or around an area or volume to capture 3D location information of all surfaces in an area or volume. The increased operating distances for oTOF can make it possible to create volumetric stages or areas that are for example, 3 m or more, or 5 m or more, or 10 m or more, or 50 m or more, or 100 m or more. The increased resolution available with oTOF, e.g. as embodied in the optical systems 10, 70, 78 of the present invention, becomes important as well because the projected area of a pixel becomes larger with the square of the distance. Greater distances require more pixels or denser points to have adequate performance for a given object size, for example arms or legs or fingers or similarly sized non-human objects. [00172] To implement, the first and second cameras 24, 26 are arranged around the perimeter of an area or volume. For some projects, the first and second cameras 24, 26 could be placed at locations with the volume to capture specific areas. For projects with small numbers of objects 14, 16, 18 or significant space between them (spaced > 5% of the volume diameter or transverse distance, or > 10%, or > 20%), 4, 5, or 6 cameras 24, 26 may be placed around the area or volume. Each camera 24, 26 may be placed so that there are minimal or no significant obscured surfaces for the locations and ranges of motions planned. The cameras 24, 26 may be placed to concentrate the fields of view (“FOV”) close to a preferred plane through the volume, such as a horizontal plane at a particular height above the floor. Or the cameras 24, 26 may be placed to approximately uniformly around a partial sphere around the volume of interest. Or the cameras 24, 26 may be placed at different radii (or the equivalent) from an approximate center of the volume or approximate location of interest. Or other arrangement that may reduce likelihood of obscurations or increase resolution or point density for volume segments of more interest for the project. [00173] If there are some obscured surfaces, software and other information from even 2D cameras can be used to fill in the location information and even color information for obscured surfaces. The color or image information (could be monochrome or other parts of the electromagnetic spectrum) can be correlated to the depth coordinates in similar fashion to a single camera and color camera described above. This correlation can be tracked in software such that the combined volumetric location data will be correlated to at least one and possibly more pixels from the color or other imagery. [00174] Software can be used to combine the 3D location data and imagery from all cameras into a common reference frame and to select or mathematically combine (for example, the average, the median, or other method) multiple measurements of a single point or region to obtain a single value (or 3D triplet of values) for the location information for that region. The same can be done to track the correlated image information in a way that can be used for subsequent display of the image information in a 3D render software (for example, texture map the image information over a 3D mesh generated from the location information or display a 2D image associated with a particular view point or create a synthetic image based on a set of correlated images that represents what would have been observable from a particular viewpoint in 3D space). [00175] For other projects or stages, additional cameras 24, 26 can be used to increase the ability of the system to capture 3D location data for objects 14, 16, 18 that are closer together or there are more objects 14, 16, 18 in the scene 12. For example, object spacing that is < 2% of the area diameter, < 5%, < 10%, or < 20% and the number of objects occupying > 0.1% of the total horizontal area (or equivalent metric along any other plane through the volume), or > 0.5%, or > 1%, or > 2%, or > 5% of the area. In all cases, the total number of cameras 24, 26 required to achieve a desired level of 3D location point density, capture volume extent, and operating distance will be significantly lower than any current approach (e.g., any approach provided by the prior art). [00176] Additionally, the oTOF 3D camera systems (e.g., the optical systems 10, 70, 78 of the present invention) will provide high density 3D location measurements as described elsewhere. But the constellation of cameras 24, 26 provide different measurements from different locations and so photogrammetric 3D location solutions can also be calculated. This photogrammetric calculation will be significantly faster than current photogrammetry solutions because the oTOF depth values provide starting values. These values may be used as weights or limits to speed the photogrammetric calculations. The combination of the 3D location determination will increase the performance of the overall system, increasing volume working size or decreasing the minimum feature size that can be supported by any volumetric system that used oTOF cameras 24, 26. Depth keying [00177] It is desirable to select elements of a scene 12 to display or composite together with other elements recorded separately or with computer generated (CG) elements. The task of separating or segmenting such elements can be keying. It is traditionally done by manually rotoscoping the elements (tracing the outline of the object(s) of interest either physically in film or digitally) or using green or blue backgrounds (as illustrated in Fig.8). The color is then used as the key to determine foreground or background (known as chromakey). But it requires the extra cost of constructing the special sets or environments and the additional equipment and software to automate the segmentation process. There can also be extra cost and complexity to remove to green or blue tinge from the color images or to match the colors shot at different times or in different sets. It is also difficult to do well with low latency. [00178] Instead of using color or other manual process to key the elements of interest, depth (or 3D location) can be used to provide a key (“depth key”) to distinguish between objects to keep or objects to remove. The depth or 3D data can be used as a key in corresponding 2D images or other data/information in multiple ways. For example, a single depth value can be used. Any pixel in the 2D image with a corresponding depth value greater than a certain value can be assigned to background or assigned to be transparent or otherwise differentiated for later processing (for example, using the alpha channel in a computer display). Alternatively, the depth or 3D location value can be compared to the plane or other geometric shape and assigned a keying or mask or matte value depending on if the 3D location is on one side of the geometric shape or another. [00179] Fig.6 illustrates this. The top image 98 shows a color image with associated depth or 3D location information (such as a mesh). The correspondence between these can be handled via UV coordinates (as known in the art of computer graphics). The second image 100 in Fig.6 shows the same color image, but the 3D location information was used to compare with two planes placed slightly in front of the two walls in the color image. The color pixels with 3D data behind the planes are not displayed so that the walls disappear. These “foreground objects” can then be displayed within a CG environment as if they were also CG objects. The third image 102 in Fig. 6 shows selected elements superposed in front of a new background. [00180] The image in Fig.7 provides an image 104 that shows a more complex result where the two actors and chair have been keyed using depth location and planes and are displayed in a CG environment with a CG table and items on the table. A real table (seen in the greyscale depth map in the inset) was keyed out using a series of parallelopiped shapes to segment the table top and table legs so that it is not displayed in the final image. The real cups can then be placed on the real table but they look like they are resting on the digital table in the final output. [00181] The software or hardware used to create the final product can vary, depending on the project needs. It can be done in real-time using fast solutions such as a game or render engine such as Unity or Unreal, or other custom built software. The information can be transmitted via computer network or save and read from computer files. The combination can be done using computer software such as Nuke or Maya or Houdini or other similar software for manipulating or combining image or 3D information. Background replacement [00182] One use of keying that is becoming more prevalent in recent years with the advent of the large LED walls that provide a controllable and manipulable background during recording or streaming of imagery and video. However, this displayed background may need to be corrected for a variety of reasons, which may be known before it is displayed or discovered after the fact. It may be desirable to replace all or part(s) of the background imagery during the shooting or recording or do so afterwards. [00183] Interlaced green frames can be displayed to provide a background for chromakeying the background from the foreground. This requires more expensive equipment and there can still be errors more moving objects. This can result in costly manual operations as well as require more time than is available. [00184] The depth key described earlier can be used instead of requiring interleaved green frames. [00185] A depth key may also be used regardless of the background or surroundings that were present during the shooting or recording. The key can be applied in real-time with low latency as illustrated by the figures in the present document, which are screen captures of the live display. Latencies below 100 ms have been achieved. For various projects, being able to achieve the depth keyed replacement in < 5 sec is sufficient. Alternative latencies include < 2 s, < 1 s, < 500 ms, < 200 ms, or < 100 ms. Ultimately, the depth key may be used so that lower cost, coarser pitch display segments can be used for a background display wall and the computer-generated (CG) background in the display virtual camera is just the original CG data and the LED wall provides primarily lighting. Or with additional software capability, the in-camera visual effects allowed by the LED walls may be done entirely with a depth key from 3D oTOF cameras (e.g., second cameras 26) and no physical display is needed. [00186] The CG background that can be used from a depth key does not have to correspond to any LED wall size or the physical FOV of the real camera. For example, the backgrounds in the Figures in the previous section are entire 3D worlds of which only a small part is visible at one time in the display virtual camera. This amount displayed of a large CG background can be controlled via a computer control, simulating a zoom capability digitally. This background can be smaller than the physical camera FOV, similar in size, or larger than. Real-time preview (“Simulcam+”) [00187] It is often desirable to be able to view some representation that will be close to the final output. Optical or digital viewfinders provide that capability for traditional cameras and now, the digital camera output can often be viewed on a smartphone or tablet or computer or other remote viewing device. When computer-generated content is mixed in some fashion with live or real elements, this becomes difficult, especially in real-time. For example, it is difficult to display the mixed output such that real objects are behind the computer generated (CG) elements. Often, real objects must be placed in specific locations or measured carefully to determine which surfaces will be displayed. [00188] For film and television productions, a technique known as “simulcam” or more loosely pre-visualization has been developed so that computer generated elements such as digital set extensions and skins can be displayed on a monitor along with the live action. However, in general, the CG elements always need to be displayed on top of the real elements (or always visible). This means that many effects or elements cannot be displayed in this type of solution. It is also difficult to always set the scale between real and CG elements since the location of the real elements relative to the CG elements is not generally known. [00189] A depth map (or other 3D location information of the pixel surfaces) (e.g., the second digital map 42) may be used to provide an improved preview solution. The 3D surface location provides the necessary data to determine whether to display the CG or real surface anytime there is an overlap. The location information also provides the necessary information to determine the relative and absolute scales of the CG and real elements. [00190] Coupled with software that can track pre-determined or automatically determined elements, such as hands, arms, legs, feet, fingers, head, face or similar, CG skins may be placed on real objects and follow those objects during any image or video capture. In fact, low latency (< several frames lag from when the event occurred) depth data allows essentially all visual effects to be displayed in real-time or approximately real-time. This capability could be referred to as Simulcam+. [00191] Prior depth capture solutions have been ineffective accomplishing this and could lead to the conclusion that this is not a desirable approach. They have been much too slow, much too coarse, or only effective at very short operating distances. However, the present invention uses much higher density depth maps that can be captured at any point in typical working volumes to achieve a viable level of performance. Fig.8 provides an image 106 that shows two real actors and two real props being inserted directly into a CG environment and CG object inserted between the body and hands of one of the actors. The real objects being displayed correctly in front of the CG elements. Shadows and other desirable effects are also cast correctly from digital lighting. Any other digital effect can be added to the display. [00192] Timing and other parameters can be set as described above. Relighting, color grading, and shadow generation [00193] There are times when a project may desire to either change the color or lighting or shadows in the final output, different from what was recorded in a real set. It can happen that things change or the presence of CG elements make it important to adjust how the real objects appear. Or the real objects may impact the desired appearance of the CG objects. Terms for this are color grading or relighting or other similar terms. [00194] The way lighting and how reflections or shadows from other objects impact the appearance of an object depends heavily on the location or proximity of the objects and the light sources. Thus, changing lighting or shadows or including or changing CG elements often becomes a manual and costly process unless the location of all surfaces and objects are known. [00195] The state-of-the-art today is that precision laser scanners are used to capture the location of static elements on a set or in a scene. However, these objects can often be moved during the project or new objects added or some objects taken away. Moving objects, such as the actors, balls, cars, or other objects, by definition are changing and cannot be captured with this process. Still images or other assist 2D cameras may provide info for photogrammetric location solutions or as guides for the manual process. But there is no robust solution today. [00196] 3D oTOF cameras (e,g,, the second cameras 26) can be used to solve this problem. Because the location information correlated to all or nearly visible surfaces in the images or video or video stream is known or can be inferred via software (as discussed hereinabove), this 3D location data can be used either in real-time or during any post-recording work. For example, in Fig.9, the first image 108 shows a daylight scene in the shadow of CG or digital trees. The second image 110 shows a nighttime scene lit by candlelight. The long range and high resolution (coupled with video rates) make it possible to apply the correct lighting in all of these situations. In these two images, the real studio lighting was not changed at all (and of course the CG elements did not exist). [00197] In this example, the candle is lighting the CG table and the real woman actor because she is in close proximity to the candles. She is also facing the display virtual camera. The male actor is not lit by the candles because he is farther away and any small illumination is on his side and not visible to the display virtual camera. [00198] Fig.8 shows the shadows created by the real objects in the CG environment. These shadows move based on ray-tracing or similar calculations as expected as the digital light source is moved. The 3D location information is necessary for these calculations, whether as points with a small area associated with them or as a mesh of polygons. Other representations of the 3D location data may also be used. Similarly, CG objects cast shadows on the real objects, with the shadow shape determined by the location of the CG object and the shape or contour of the real object. High resolution 3D location data makes this possible at this level of fidelity and for large areas or scenes. [00199] The oTOF 3D location data, as provided by the second digital map 42, for example, also provides the necessary information to create the appropriate lighting or shadow look if additional lights are added to the scene. The lights could be standard light sources such as a light bulb or more exotic ones such as a glowing fairy or fireball or any other source of light. The video rates for the high resolution 3D data make it possible to correctly apply the lighting effects even if the CG light is moving or the real object is moving. [00200] Other projects may wish to change or adjust the lighting on real objects after the images or video are recorded or even live because of some creative desire. The 3D data may be readily used in software to change the lighting (or color) characteristics of a region so that the distance- dependent effects of light look as they should. For example, a foreground light would brighten objects closer to the light more than objects in the background that are further from the light. [00201] This process and result can be done more quickly than current solutions because of the resolution, range, and speed of 3D oTOF cameras. Real-CG interaction [00202] Render software such as game engines or similar software provide mechanisms to check if different meshes collide or overlap. Features like this may be combined with the 3D location information from 3D oTOF cameras (e.g., the second cameras 26) to create interactive effects. The low latency, high speed nature of the 3D data along with the working range and high resolution are valuable to perform effects like this in real time. The current solutions are very limited to well-rehearsed simple motions with specific objects and locations and often fail anyway. It is not practical to do such effects on a regular basis with current solutions. [00203] For example, if there is a CG element present in a displayed scene, it has an associated mesh of location data. The 3D oTOF location data of real objects such as a paddle or actor can also be represented by a mesh of polygons or other similar representation. The meshes of each are associated with the displayed image or texture via UV coordinates or similar technique (or colorized points can be used). Mesh collision detection algorithms or similar approaches as appropriate can be used to detect when the two meshes begin to overlap. Software can then be used to move the CG mesh according to defined rules (for example, move the CG mesh away from the real object mesh at the same speed as the real object mesh was moving). Or apply some physical force and response mathematical model. Or apply a random direction and velocity vector. Or apply an acceleration vector according to the length of the mesh overlap or real object mesh velocity vector or a weight or mass coefficient. Or other similar things. Or the CG object could disappear from view. Or it could catch fire. Or other types of effects. [00204] More generally, various math or physical models may be applied to govern what happens to the CG mesh or even aspects of the real object when the two meshes begin to overlap or overlap according to some rules. The result is an ability for real objects to interact with CG objects in ways that may look realistic or fanciful, as desired. Volumetric effects (e.g. Digital smoke) [00205] High precision depth means things like smoke and fog obscure some objects and not others, based on optical path distances. Fig.10 provides an image 112 that shows an example of digital fog coming from the background and the woman is obscured by the fog before the man in the foreground is. These effects depend on the 3D location of the various objects and the desired effect, similar to the relighting described above. Digital focus adjust [00206] Another example of effects that can be readily accomplished using the improvements in the 3D location data from a 3D oTOF camera (e.g., the second camera 26) is illustrated in Fig.11. Image 114 shows the as captured color image that has been depth keyed and placed in a CG world. Image 116 shows the same data, but a digital blur has been applied to the displayed virtual camera, depending on the distance from the virtual camera. This simulates the effect of a fast camera lens with a narrower depth of field. This can be done in this instance because the oTOF camera provides the needed distance information for every frame with low latency, at long range (5 – 7 m in this example) and high resolution. [00207] For example, such digital blur can be applied for objects in any type of virtual or real space. CG objects could be chosen to be affected differently than real objects. Specific objects may be chosen to be affected differently than the rest (for example, never blurred). The effect can be applied to all objects (real or CG), objects between 2 m and 20 m, between 3 m and 30 m, between 2 m and 10 m, beyond 10 m, beyond 20 m, between 2 m and 5 m, between 5 m and 10 m. [00208] As discussed hereinabove, the embodiments of the present invention are exemplary only and are not intended to limit the present invention. Features from one embodiment are interchangeable with other embodiments, as should be apparent to those skilled in the art. As such, variations and equivalents of the embodiments described herein are intended to fall within the scope of the claims appended hereto.

Claims

Claims 1. A system for generating a three dimensional, digital representation of a scene, comprising: a nonvisible light illumination source to generate and to illuminate the scene with nonvisible light; a first camera, wherein the first camera is adapted to receive light from the scene and generate a first digital map therefrom, wherein the first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels, and wherein each first camera pixel of the plurality of first camera pixels is associated with a color or with a grayscale, and a second camera, wherein the second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the nonvisible light illumination source, and to generate a second digital map therefrom, wherein the second digital map comprises a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels, wherein each second camera pixel of the plurality of second camera pixels is associated with a depth, and wherein the depth is determined as a function of optical time of flight information for the nonvisible light; wherein the first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween; a processor connected to the first camera and to the second camera, wherein the processor receives the first digital map and the second digital map, and wherein the processor combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene; and wherein the three dimensional, digital representation of the scene satisfies the following an image resolution with an image pixel count of greater than or equal to 0.9 M, wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene, an image latency of 10 ms – 30 sec, wherein the image latency comprises a processing time between receipt of the first and second digital maps by the processor and generation of the three dimensional, digital representation of the scene, and a distance of 10 cm – 200 m, wherein the distance is measured from the first and second cameras to an object in the scene.
2. The system of claim 1, further comprising: a display connected to the processor to display the three dimensional representation.
3. The system of claim 1, further comprising: a memory connected to the processor to save the three dimensional, digital representation of the scene.
4. The system of claim 1, wherein successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 fps – 250 fps.
5. The system of claim 4, wherein, for movie production, the three dimensional digital representation of the scene satisfies at least one of: the distance is 10 cm – 50 m, and the frame rate is 23 – 250 fps.
6. The system of claim 2, wherein, for pre-visualization (pre-viz) of scenes for movies, the three dimensional, digital representation of the scene also satisfies: the image latency is 10 ms – 10 s, the distance is 10 cm – 50 m, and the frame rate is 23 – 250 fps.
7. The system of claim 4, wherein, for live broadcast events, the three dimensional, digital representation of the scene satisfies at least one of: the image latency is 10 ms – 1 s, the distance is 2 m – 200 m, and the frame rate is 23 – 250 fps.
8. The system of claim 1, wherein, for volumetric capture, the three dimensional, digital representation of the scene satisfies at least one of: the image latency is 10 ms – 1 s, the distance is 1 m – 50 m, and the frame rate is 5 – 250 fps.
9. The system of claim 1, wherein, for static capture, the three dimensional, digital representation of the scene satisfies at least one of: the image latency is 10 ms – 1 s, the distance is 50 cm – 200 m, and the frame rate is 5 – 250 fps.
10. The system of claim 1, wherein the first camera and the second camera are housed within a single housing.
11. The system of claim 10, wherein the single housing comprises a beam splitter to direct the light to the first camera and the nonvisible light to the second camera.
12. The system of claim 1, wherein the first camera comprises a plurality of first cameras.
13. The system of claim 1, wherein the second camera comprises a plurality of second cameras.
14. A system for generating a three dimensional, digital representation of a scene, comprising: a nonvisible light illumination source to generate and to illuminate the scene with nonvisible light; a first camera, wherein the first camera is adapted to receive light from the scene and generate a first digital map therefrom, wherein the first digital map comprises a plurality of first camera pixels with a first resolution of about 100K – 400M of the first camera pixels, and wherein each first camera pixel of the plurality of first camera pixels is associated with a color or with a grayscale, and a second camera, wherein the second camera is adapted to receive the nonvisible light from the scene, the nonvisible light having been generated by the nonvisible light illumination source, and to generate a second digital map therefrom, wherein the second digital map comprises a plurality of second camera pixels with a second resolution of about 100K – 400M of the second camera pixels, wherein each second camera pixel of the plurality of second camera pixels is associated with a depth, and wherein the depth is determined as a function of optical time of flight information for the nonvisible light; wherein the first digital map and the second digital map are generated synchronously such that there is a scenic correlation therebetween; a processor connected to the first camera and to the second camera, wherein the processor receives the first digital map and the second digital map, and wherein the processor combines the first digital map with the second digital map to generate the three dimensional, digital representation of the scene; and wherein the three dimensional, digital representation of the scene satisfies the following an image resolution with an image pixel count of greater than or equal to 0.9 M, wherein the image pixel count comprises a number of final pixels comprising the three dimensional, digital representation of the scene, a distance of 10 cm – 200 m, wherein the distance is measured from the first and second cameras to an object in the scene, and successive ones of the three dimensional, digital representations of the scene are assembled by the processor to generate a video with a frame rate of 5 fps – 250 fps.
15. The system of claim 14, further comprising: a display connected to the processor to display the three dimensional representation.
16. The system of claim 14, further comprising: a memory connected to the processor to save the three dimensional, digital representation of the scene.
17. The system of claim 16, wherein, for movie production, the three dimensional digital representation of the scene satisfies at least one of: the distance is 10 cm – 50 m, and the frame rate is 23 – 250 fps.
18. The system of claim 14, wherein, for volumetric capture, the three dimensional, digital representation of the scene satisfies at least one of: the distance is 1 m – 50 m, and the frame rate is 5 – 250 fps.
19. The system of claim 14, wherein, for static capture, the three dimensional, digital representation of the scene satisfies at least one of: the distance is 50 cm – 200 m, and the frame rate is 5 – 250 fps.
20. The system of claim 14, wherein the first camera and the second camera are housed within a single housing.
21. The system of claim 20, wherein the single housing comprises a beam splitter to direct the light to the first camera and the nonvisible light to the second camera.
22. The system of claim 14, wherein the first camera comprises a plurality of first cameras.
23. The system of claim 14, wherein the second camera comprises a plurality of second cameras.
PCT/US2023/081670 2022-11-30 2023-11-29 Apparatus and method for real-time three dimensional imaging WO2024118828A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263429135P 2022-11-30 2022-11-30
US63/429,135 2022-11-30

Publications (1)

Publication Number Publication Date
WO2024118828A1 true WO2024118828A1 (en) 2024-06-06

Family

ID=89507608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/081670 WO2024118828A1 (en) 2022-11-30 2023-11-29 Apparatus and method for real-time three dimensional imaging

Country Status (1)

Country Link
WO (1) WO2024118828A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8254009B2 (en) 2010-02-04 2012-08-28 Pv Labs, Inc. Optically powered electro-optical component
US8471895B2 (en) 2008-11-25 2013-06-25 Paul S. Banks Systems and methods of high resolution three-dimensional imaging
US10104365B2 (en) 2014-04-26 2018-10-16 Tetravue, Inc. Method and system for robust and extended illumination waveforms for depth sensing in 3D imaging
US10437082B2 (en) 2017-12-28 2019-10-08 Tetravue, Inc. Wide field of view electro-optic modulator and methods and systems of manufacturing and using same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8471895B2 (en) 2008-11-25 2013-06-25 Paul S. Banks Systems and methods of high resolution three-dimensional imaging
US9007439B2 (en) 2008-11-25 2015-04-14 Tetravue, Inc. Systems and method of high resolution three-dimesnsional imaging
US10218962B2 (en) 2008-11-25 2019-02-26 Tetravue, Inc. Systems and method of high resolution three-dimensional imaging
US8254009B2 (en) 2010-02-04 2012-08-28 Pv Labs, Inc. Optically powered electro-optical component
US8493645B2 (en) 2010-02-04 2013-07-23 Pv Labs, Inc. Optically powered optical modulator
US10104365B2 (en) 2014-04-26 2018-10-16 Tetravue, Inc. Method and system for robust and extended illumination waveforms for depth sensing in 3D imaging
US10437082B2 (en) 2017-12-28 2019-10-08 Tetravue, Inc. Wide field of view electro-optic modulator and methods and systems of manufacturing and using same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PEDRO ARIZPE-GOMEZ ET AL: "Preliminary Viability Test of a 3-D-Consumer-Camera-Based System for Automatic Gait Feature Detection in People with and without Parkinson's Disease", 2020 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), IEEE, 30 November 2020 (2020-11-30), pages 1 - 7, XP033886513, DOI: 10.1109/ICHI48887.2020.9374363 *
QM13: "Azure Kinect DK hardware specifications | Microsoft Learn", 9 February 2022 (2022-02-09), pages 1 - 12, XP093134217, Retrieved from the Internet <URL:https://learn.microsoft.com/en-us/azure/kinect-dk/hardware-specification> [retrieved on 20240222] *

Similar Documents

Publication Publication Date Title
US11354840B2 (en) Three dimensional acquisition and rendering
US11115633B2 (en) Method and system for projector calibration
Anderson et al. Jump: virtual reality video
US7182465B2 (en) Methods, systems, and computer program products for imperceptibly embedding structured light patterns in projected color images for display on planar and non-planar surfaces
US8867827B2 (en) Systems and methods for 2D image and spatial data capture for 3D stereo imaging
US8334893B2 (en) Method and apparatus for combining range information with an optical image
Matsuyama et al. 3D video and its applications
Grau et al. A combined studio production system for 3-D capturing of live action and immersive actor feedback
US20150294492A1 (en) Motion-controlled body capture and reconstruction
US11514654B1 (en) Calibrating focus/defocus operations of a virtual display based on camera settings
KR102010396B1 (en) Image processing apparatus and method
KR102067823B1 (en) Method and apparatus for operating 2d/3d augument reality technology
US20160266543A1 (en) Three-dimensional image source for enhanced pepper&#39;s ghost illusion
CN113692734A (en) System and method for acquiring and projecting images, and use of such a system
US9897806B2 (en) Generation of three-dimensional imagery to supplement existing content
US20210374982A1 (en) Systems and Methods for Illuminating Physical Space with Shadows of Virtual Objects
WO2024118828A1 (en) Apparatus and method for real-time three dimensional imaging
Katayama et al. A method for converting three-dimensional models into auto-stereoscopic images based on integral photography
KR20050015737A (en) Real image synthetic process by illumination control
KR102654323B1 (en) Apparatus, method adn system for three-dimensionally processing two dimension image in virtual production
Bimber et al. Digital illumination for augmented studios
WO2023047643A1 (en) Information processing apparatus, image processing method, and program
WO2024042893A1 (en) Information processing device, information processing method, and program
Kasim et al. Glasses-free Autostereoscopic Viewing on Laptop through Spatial Tracking
CA3225432A1 (en) Image generation