WO2023001816A1 - Caméra virtuelle à réalité augmentée - Google Patents

Caméra virtuelle à réalité augmentée Download PDF

Info

Publication number
WO2023001816A1
WO2023001816A1 PCT/EP2022/070186 EP2022070186W WO2023001816A1 WO 2023001816 A1 WO2023001816 A1 WO 2023001816A1 EP 2022070186 W EP2022070186 W EP 2022070186W WO 2023001816 A1 WO2023001816 A1 WO 2023001816A1
Authority
WO
WIPO (PCT)
Prior art keywords
real
video stream
video
augmented
camera
Prior art date
Application number
PCT/EP2022/070186
Other languages
English (en)
Inventor
Josua HÖNGER
Zoran ANGELOV
Markus Rossi
Original Assignee
Scanergy Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scanergy Gmbh filed Critical Scanergy Gmbh
Publication of WO2023001816A1 publication Critical patent/WO2023001816A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Definitions

  • the invention relates to augmented reality technology. It relates to methods and apparatuses according to the opening clauses of the claims.
  • a wealth of technologies for generating an augmented video stream are known in the art.
  • the inventors contemplated a new way of generating an augmented video stream which enables simple and intuitive interaction by a user.
  • some embodiments of the invention can be carried out using current standard hardware, such as modem smartphones and laptop computers, plus dedicated software.
  • the invention can furthermore find application in video conferencing, namely the augmented video stream can be used as one participant’s video signal in today’s video conferencing software and thus be transmitted to further participants.
  • the method for generating an augmented video stream can comprise
  • the system (or combination) for generating an augmented video stream can comprise — a first video unit comprising a real video camera, configured to generate a first video stream comprising real-time video stream data from the real video camera;
  • a sensing unit configured to determine in real-time relative pose data indicative of a relative position in space and of a relative orientation in space of the real video camera and the real object;
  • a second video unit operationally connected to the sensing unit for receiving the relative pose data from the sensing unit and configured
  • a video processing unit operationally connected to the first video unit and to the second video unit, configured to output and/or generate from the first video stream and the second video stream the augmented video stream in real-time.
  • the invention can make possible that the representation of the virtual object (more particularly: its representation in the augmented video stream) can be modulated in real time by moving the real object. And it is possible to accomplish this in a synchronized manner. Accordingly, a user can modify the augmented video stream and more particulary the representation of the virtual object therein in a very simple and intuitive way. Movements of the real object can be translated into (virtual) movements of the virtual object in the augmented video stream.
  • the real object can be used by a user as a controller for controlling (virtual) movements of the virtual object in the augmented video stream.
  • the invention can enable to create the illusion that the virtual object in the augmented video stream moves in real space exactly as a real-world item firmly connected to the real object would do.
  • the real video camera is a physically existing video camera; in contrast to a merely virtual camera.
  • it will typically comprise a photo sensor, such as semiconductor RGB imaging chip, and usually also one or more optical elements, such as one or more lenses.
  • a photo sensor such as semiconductor RGB imaging chip
  • optical elements such as one or more lenses.
  • it is, at least in most embodiments, unnecessary to move the real video camera, i.e. it can remain in one and the same position and orientation in space, i.e. remain in one and the same pose (the combination of position and orientation of an object is referred to as the “pose” of an object).
  • the real video camera is configured to image a (real; physically existing) scene within its field of view, typically in proximity to the real video camera.
  • the real video camera is a 2D video camera, such as a camera generating a sequence of color images.
  • the real video camera is a 2D-plus-depth video camera, such as a camera generating a sequence of color images (in one layer) containing (in another layer) depth information, i.e. at least a portion of the pixels bear information regarding a distance between the camera and an object imaged at the respective pixel.
  • a 2D-plus-depth video camera such as a camera generating a sequence of color images (in one layer) containing (in another layer) depth information, i.e. at least a portion of the pixels bear information regarding a distance between the camera and an object imaged at the respective pixel.
  • the real video camera may comprise a processing unit for these purposes; and/or they can be based on stereo imaging, such that the real video camera may comprise two cameras and a processing unit for this purpose; and/or they can be based on other techniques, such as time-of-flight sensing, structured light imaging, LiDAR (light detection and ranging) in which cases the real video camera can comprise a light emitter, such as an infrared light source for such purposes, e.g., emission of structured light.
  • a light emitter such as an infrared light source for such purposes, e.g., emission of structured light.
  • the real video camera is a 3D video camera (volumetric camera), such as a camera generating a sequence of volumetric data.
  • a video camera can comprise for this purpose, e.g., a processing unit and a plurality of subordinate video cameras viewing a scene to be imaged from different positions and/or angles.
  • the real video camera is a peripheral camera of a computing device, such as a camera with wireless, e.g., “Bluetooth” -based or wirebound, e.g., USB-based interconnectivity.
  • the real video camera is a built-in camera of a computing device, such as of smartphone, a laptop computer, a tablet computer, a desktop computer - more precisely of a monitor of the desktop computer.
  • a video stream, and in particular the first video stream and/or the second video stream and/or the augmented video stream, is a sequence (more particularly: sequence in time) of frames.
  • a frame comprises image data, e.g., it can constitute image data.
  • the image data are 2D color data, such as data describing RGB pixels.
  • volumetric data such as a stack of image data or differently defined data describing properties at voxels (“3D pixels”) such as color information at grid points of a 3D grid in a volume.
  • a frame can, in instances, comprise more than one data layer. More particularly, a frame can comprise, in addition to said image data (constituting a data layer) or to said volumetric data (constituting a data layer), one or more additional data layers.
  • additional data layers can comprise (and in particular constitute), e.g., depth image data (indicative of distances along a depth direction), confidence level data (indicative of a reliability or trustability of data of another layer), meta data, such as pose data of an object in the frame.
  • a frame and, more particularly each layer of a frame can optionally comprise one or more (fully) transparent pixels (or voxels), i.e. pixels (or voxels) bearing no information (such as no color information, no depth information, no confidence information).
  • Size and shape of a frame is not particularly limited, e.g., it does not need to be contiguous or does not need to be rectangular, which applies also simple color image frames.
  • a color image frame e.g., of the second video stream, may show a representation (or view) of a (e.g., small round) virtual object only; and alternatively, e.g., also as a possibility for the second video stream, a frame having this contents can be contiguous and rectangular, namely by comprising, as mentioned above, transparent pixels:
  • the frame e.g., showing a representation (or view) of a (e.g., small round) virtual object, while all other pixels are (fully) transparent.
  • the video streams are in an uncompressed data format. This can improve performance and thus facilitate the real-time processing.
  • the video streams could also be in a compressed data format.
  • the first video stream consists of real-time video stream data from the real video camera.
  • the first video stream is simply the (unaltered) output of the real video camera.
  • the first video unit can be identical to the real video camera, e.g., in this case.
  • the first video stream is obtained by altering, e.g., processing the real-time video stream data from the real video camera.
  • the first video stream can comprise merely a portion of the real-time video stream data; or the first video stream can comprise further video information, such as virtual contents, e.g., by replacing a portion of the real-time video stream data by virtual contents.
  • the first video unit can comprise further real and/or virtual video cameras and/or a processing unit, e.g., in this case.
  • the real object is a physically existing object - in contrast to a virtual object.
  • it can be a movable object, particularly in the sense that its size and weight are such that it can be readily moved by an average human being, e.g., using a hand only.
  • the real object is a part of a human body, in particular a hand or a part of a hand. This enables simple and intuitive operation.
  • the real object is a hand-held device.
  • the real object is an office supplies item such as a writing utensil.
  • the real object is a handheld computing device, such as a smartphone or a tablet computer.
  • the real object is an add-on-device for such a handheld computing device, in particular attached to the handheld computing device.
  • the real object is a device comprising one or more components of the sensing unit, in particular one or more sensors of the sensing unit, e.g., the device can comprise one or more sensors for sensing its position and/or its orientation in space.
  • Said device can be, e.g., a handheld computing device as mentioned above, said sensor being, e.g., a built-in sensor of the handheld computing device.
  • said device can be, e.g., an add-on device as mentioned above, said one or more sensors being built-in sensors of the add-on device.
  • the relative pose data could also be referred to as or considered “arrangement data”, as they describe the relative arrangement (in space) of the real object and the real video camera.
  • the term “relative” pose/position/orientation of (or between) the real object and the real video camera does not specify whether it is, e.g., a pose/position/orientation of the real object with respect to the real video camera, or a pose/position/orientation of the real video camera with respect to the real object. It can can be, e.g., any of these.
  • the respective pose is determined, relative to one and the same coordinate system, and from this, the relative pose is determined.
  • the pose of the real video camera is determined in one coordinate system, and the pose of the real object is determined in another coordinate system different from the first coordinate system. Then, the two coordinate systems are interrelated, e.g., by a calibration procedure, and from this, finally the relative pose is determined.
  • a pose or at least of a position in space or an orientation in space
  • an object such as of the real object or the real video camera
  • techniques are known and available, such as accelerometric techniques; gyroscopic techniques; gravity-based techniques; techniques based on determination of the environment of the object, which can comprise, e.g., techniques based on o algorithmic evaluation of video data; o image analysis with object recognition; o machine-learning supported evaluation of video data, o deep learning supported evaluation of video data, o artificial intelligence-based evaluation of video data; o depth-sensing techniques (e.g., based on time-of-flight sensing, based structured light imaging, based on stereo imaging, LiDAR); combinations of two or more of these.
  • sensor combinations which could be used, comprise, e.g., sensor combination associated with augmented reality development toolkits such as “ARKit” (by Apple), “ARCore” (by Google), “Vuforia” (by PTC), or sensors of “Kinect” (by Microsoft).
  • augmented reality development toolkits such as “ARKit” (by Apple), “ARCore” (by Google), “Vuforia” (by PTC), or sensors of “Kinect” (by Microsoft).
  • the sensing unit can comprise one or more sensors, such as sensors or sensor combinations as mentioned above and, usually also a processing unit for processing, e.g., interrelating data and/or evaluating data, such as converting sensor raw data into calibrated data.
  • a processing unit for processing e.g., interrelating data and/or evaluating data, such as converting sensor raw data into calibrated data.
  • the receiving by the second video unit of the relative pose data from the sensing unit is accomplished in a wireless fashion.
  • the relative pose data can be transmitted from the sensing unit to the second video unit in a wireless fashion.
  • the sensing unit is distributed over two or more devices. Communication between the devices for exchange of data related to sensing results obtained by at least one sensor of the sensing unit can be accomplished in a wireless fashion, in particular if a portion of the sensing unit, e.g., a sensor, is comprised in the real object. Accordingly, the sensing unit can comprise a wireless communication capability, e.g., embodied as communication units in the respective devices.
  • a portion of the sensing unit can be comprised in real object, and another portion in another device such as in a computing device, e.g., in a computing device comprising the real video camera.
  • the second video unit can be embodied in form of software implmented in a computing device, in particular in a graphics processing unit (GPU) of the computing device.
  • GPU graphics processing unit
  • the video processing unit can be embodied in form of software running on a computing device, in particular in a graphics processing unit (GPU) of the computing device.
  • GPU graphics processing unit
  • the video processing unit can be, e.g., a video mixer.
  • the augmented video stream typically comprises data derived from the first video stream and data derived from the second video stream.
  • the augmented video stream (in a simple example) can comprise at least a portion of the first video stream and at least a portion of the second video stream.
  • Generating the augmented video stream can comprise merging the representation of the virtual object into the first video stream.
  • the augmented video stream can be generated, e.g., in a frame-wise manner.
  • generating the augmented video stream can comprise repeatedly (frame- by-frame) grabbing a frame of the first video stream (first frame) and a (simultaneous) frame of the second video stream (second frame) and creating a new frame from those two frames, which then constitutes a frame of the augmented video stream (augmented frame).
  • the two frames can be merged, overlayed or be otherwise combined.
  • the second video stream (and second frame) shows a representation (or view) of the virtual object, while all other pixels (or voxels) are (fully) transparent
  • the augmented frame can be created by replacing in the first frame those pixels which in the second frame are not fully transparent.
  • a respective pixel of the augmented frame could be obtained using both, data from the respective pixel of the second frame and data from the respective pixel of the first frame.
  • generating the augmented video stream can be accomplished without separately storing (e.g., in computer memory) the second video stream, not even for a single frame. Namely by generating, e.g., frame-by-frame, the data representing the representation of the virtual object (second frame) and storing these data in locations (e.g., in computer memory) where the data of the first video unit (of the simultaneous frame; first frame) are stored (e.g., in computer memory). This way, less memory is used, and less memory read and write operations need to be carried out.
  • the respective frame of the augmented video stream (augmented frame) is “automatically” completed - in the location where initially (and exclusively) the data of the first video stream (first frame) had been stored.
  • the video processing unit can, in this case, merely read the data from that memory and output the same - as a frame of the augmented video stream (augmented frame).
  • the second video stream is factually generated (by the second video unit), as the data representing the representation of the virtual object (or the second frame) are generated, e.g., as a time-sequence of frames, and merely not separately stored, but stored in said locations (computer memory locations) initially taken by data of the first video stream (first frame).
  • generating the second video stream can factually effect the generation of the augmented video stream.
  • the second video unit can, in part, coincide with (be identical to) the video processing unit, namely in that the generation of the augmented video stream is accomplished by the second video unit.
  • the second video unit and the video processing unit in one and the same unit, such as in one and the same software (program code).
  • the real object is a smartphone with built-in sensors such as sensors associated with an augmented reality development toolkit, such as with “ARKit” (by Apple) or “ARCore” (by Google) or “Vuforia” (by PTC), as components of the sensing unit;
  • the first video unit is a built-in video camera of a computing device (such as of a laptop computer), e.g., built into a monitor of the computing device;
  • the computing device embodies (in form of hardware and software) a processing unit of the sensing unit for evaluating raw data (or other sensing data) received from the sensors;
  • the real object such as the smartphone
  • the computing device comprise wireless communication capability, such as according to a “Bluetooth” standard or “WiFi”, for transmitting and receiving, respectively, the raw data (or other sensing data) from the sensors;
  • the computing device embodies (in form of hardware and software) the second video unit and the video processing unit;
  • the augmented video stream can be outputted to the monitor of the computing device and/or can be forwarded to a video conferencing software, such as to be transmitted via the internet or to a peripheral device.
  • the method comprises moving the real object in space (i.e. in real space).
  • the moving can be accomplished by a user.
  • the modulating modulating the representation of the virtual object in the second video stream
  • the modulating is accomplished in such a way that the representation of the virtual object in the augmented video stream moves in dependence of the movement of the real object.
  • the modulating can be accomplished in such a way that the representation of the virtual object in the augmented video stream moves identically to the movement of the real object.
  • the real object can in this regard be considered a pointer for the virtual object.
  • the determining of the relative pose data takes place during the moving of the real object (since it takes place in real-time).
  • the modulating comprises changing the representation of the virtual object in the second video stream in such a way that at least one of
  • an apparent position of the virtual object in the augmented video stream is changed in dependence of the relative position in space of the real video camera and the real object;
  • an apparent orientation of the virtual object in the augmented video stream is changed in dependence of the relative orientation in space of the real video camera and the real object.
  • the modulating comprises changing the representation of the virtual object in the second video stream in such a way that at least one of
  • an apparent position of the virtual object in the augmented video stream is linked to the relative position in space of the real video camera and the real object;
  • an apparent orientation of the virtual object in the augmented video stream is linked to the relative orientation in space of the real video camera and the real object.
  • an apparent position of the virtual object in the augmented video stream can change proportionally to changes of the position of the real object, i.e. along the same direction and along a proportional distance.
  • an apparent orientation of the virtual object in the augmented video stream can change identically to changes of the orientation of the real object.
  • the modulating comprises changing the representation of the virtual object in the second video stream in such a way that changes of the pose of the virtual object in the augmented video stream are identical to changes of the pose of the real object.
  • the movements of the virtual object in the augmented video stream and the movements of the real object (in real space) are coordinated such that the two seem to “move together”.
  • the real object is represented (visible) in the augmented video stream, this can, in the augmented video stream, provide the illusion that the virtual object is firmly connected to the real object.
  • virtual movements of the representation of the virtual object in the augmented video stream can (effectively) be controlled by moving the real object.
  • the real object functions as a movement controller for controlling virtual movements of the representation of the virtual object in the augmented video stream.
  • Movements and virtual movements can comprise one or both of position changes and orientation changes, i.e. can comprise pose changes.
  • the method comprises
  • Moving the real object relative to the real video camera can comprise, e.g., moving the real object in real space. And/or during the moving (of the real object relative to the real video camera), the real video camera can remain unmoved (remain still).
  • the method further comprises
  • the real object may enter a viewport of the real video camera.
  • This can create interesting impressions in the augmented video stream (the real object appearing as a pointer to or holder of the virtual object) and/or be useful for calibration purposes.
  • the method further comprises
  • a corresponding calibration procedure can, in this case, comprise:
  • all three axes of the coordinate system of the viewport can be associated with the coordinate system of the real object.
  • the real video camera is assumed to remain unmoved at least during the calibration procedure.
  • the method for video conferencing can comprise
  • the feeding comprises
  • the device driver software either — applying modifications, in particular one or more transformations, to the augmented video stream and feeding the so-modified augmented video stream to the video conferencing software; or
  • the device driver software can be configured such that the only modifications it can apply to a video stream (to the augmented video stream) are transformations. Transformations are format changes, such as changes in color bit depth, changes in the number of pixels per image of the video stream and the like.
  • the device driver can effect that the augmented video stream is accepted by a computer operating system in the same way as a device driver of a standard (real) video camera (such as the real video camera) is accepted by the computer operating system.
  • a standard (real) video camera such as the real video camera
  • the augmented video stream can be made readily available to further computer programs, such as to standard video conferencing software.
  • the system - comprising the device driver - can be considered to comprise a virtual camera.
  • the method further comprises
  • the device driver software registering itself with an operating system as a camera device driver.
  • the device driver software When the device driver software is configured such that it registers itself with an operating system as a camera device driver upon its installation on a computer on which the operating system is executed, a great simplification for a user is achieved.
  • the method steps may be performed in any order (sequence) including simultaneous performance of steps.
  • the invention comprises apparatuses (systems) with features of corresponding methods according to the invention, and, vice versa, also methods with features of corresponding apparatuses (systems) according to the invention.
  • Fig. 1 a schematic diagram illustrating a system for generating an augmented video stream, also for explication of the corresponding method
  • FIG. 2 a schematic illustration of a way of generating an augmented frame
  • Fig. 3 a schematic illustration of another way of generating an augmented frame.
  • Fig. 1 shows a schematic diagram of a illustrating a system for generating an augmented video stream S3 which furthermore is used to explain a method for generating the augmented video stream S3.
  • the system - which can work in real-time - comprises a real object rO, such as a smartphone and a real video camera C which constitutes or is part of a first video unit. Further, it comprises a sensing unit which comprises a processing unit P and one or more sensors (symbolized in Fig. 1 by coordinate systems). The system also comprises a second video unit V and a video processing unit M.
  • a real object rO such as a smartphone and a real video camera C which constitutes or is part of a first video unit.
  • a sensing unit which comprises a processing unit P and one or more sensors (symbolized in Fig. 1 by coordinate systems).
  • the system also comprises a second video unit V and a video processing unit M.
  • the first video unit generates a first video stream SI, typically showing a real scene visible in a field of view of real video camera C (symbolized in Fig. 1 by thin dotted lines).
  • the second video unit V generates a second video stream S2 comprising a representation of a virtual object vO.
  • the first video stream SI and the second video stream S2 are used to create the augmented video stream S3, e.g., by merging the two video streams SI, S2.
  • the sensing unit (e.g., its processing unit P) outputs relative pose data A which are used by second video unit V for modulating the representation in the second video stream S2 of the virtual object vO.
  • the relative pose data A characterize a relative position (in real space) and a relative orientation (in real space) of the real object rO and the real video camera C. If a user now moves the real object rO (the moving symbolized in Fig. 1 by the hollow arrows), the representation of the virtual object vO in the second video stream S2 and in the augmented video stream S3 can change in dependence thereof, e.g., in a corresponding way, as effected by the second video unit V. Thus, the real object rO can be used to control the pose (position and orientation) of the virtual object vO (in the augmented video stream S3).
  • a way to determine the relative pose data A comprises determining the pose of the real object rO, e.g., by means of sensors comprised in the real object, such as sensors associated with an augmented reality development toolkit, such as “ARKif ’ in case the real object is an “Apple” “iPhone 12 Pro”.
  • Pose data of the real object rO obtained in such a way are not the sought relative pose data A.
  • a link between a coordinate system on which the pose data of the real object rO are based and a coordinate system associated with the real video camera C can make possible to transform the pose data of the real object rO into the relative pose data A. Such a link can be accomplished in a calibration procedure.
  • processing unit P can determine the relative pose data A. This works well, at least as long as the pose of real video camera C remains unchanged, i.e. as long as real video camera C is still (not moved).
  • An alterantive way of determining the relative pose data A is to determine the pose data of the real object rO and the pose data of the real video camera C, as symbolized in Fig. 1, both with respect to one and the same coordinate system. Therefrom, the relative pose data A can be readily determined, possibly even without requiring a calibration procedure. This way works well also when real video camera C is moved.
  • Various further ways of sensing to finally derive the relative pose data A are enabled, in particular considering today’s commercially available or built-in sensors, which comprise, e.g., depth sensing techniques, like based on time-of-flight or structured light or stereo vision or monocular vision in combination with algorithmic methods, machine learning-based methods, deep learning-based methods, artificial intelligence-based methods.
  • depth sensing techniques like based on time-of-flight or structured light or stereo vision or monocular vision in combination with algorithmic methods, machine learning-based methods, deep learning-based methods, artificial intelligence-based methods.
  • the first video stream SI may be used as a sensor of the sensing unit, e.g., the sensing being based on image analysis with object recognition and/or algorithmic methods, machine learning-based methods, deep learning-based methods, artificial intelligence-based methods such as to at least partially determine the relative pose data - which bears the advantage that a so-detected pose of the real object rO already can be - intrinsically - the relative pose (because the images, forming the basis for the determination of the relative pose data are always in the coordinate system of the real video camera).
  • the real video camera C is a 2D-color-plus-depth camera or even a 3D video camera
  • the relative pose data A may be even more precisely determinable.
  • the processing unit P, the second video unit V and the video processing unit M may, for example, be implemented in software (program code) implemented in a computer of the system, such as on a laptop or desktop computer or a mobile, e.g., handheld or head- mounted computing device, symbolized in Fig. 1 by the large dashed rectangle.
  • Another piece of software may be comprised in the system and may be implemented in the computer, functioning as a device driver to be recognized by an operating system running in the computer as a camera device driver.
  • the device driver is symbolized as a small dashed rectangle. It receives the augmented video stream S3 and outputs it (or a video stream derived from augmented video stream S3, e.g., by applying format changes), so that it can be readily fed to further programs such as to a standard video conferencing software.
  • the real video camera C can be, e.g., a camera of the computer, such as a built-in camera of a monitor of a desktop computer or the built-in camera of a laptop computer. Or it can be a camera operationally connected to the computer, such as a peripheral camera device, e.g., connected to the computer in a wirebound or wireless fashion.
  • Figs. 2 and 3 each show a schematic illustration of a way of generating an augmented frame F3, i.e. a frame of the augmented video stream S3.
  • a first frame i.e. a frame of the first video stream SI
  • a second frame i.e. a frame of the second video stream S2
  • F2 a representation of real-time video stream data from real video camera C
  • second frame F2 a representation of the virtual object vO is illustrated.
  • augmented frame F3 a result of merging frames F 1 and F2 is illustrated.
  • Fig. 2 is to illustrate that frame FI is stored in a first memory location and that frame F2 is stored in a second memory location. Then, by video processing unit M, frame F3 is generated in a third memory location, from frames FI and F2.
  • Fig. 3 is to illustrate that frame FI is stored in a first memory location. Then, during generation of the data representing the representation of the virtual object vO, these data are written (stored) into said first memory location, e.g., by simply overwriting the corresponding data of the first frame FI; the middle portion of Fig. 3 illustrates the situation after a bit more than half of said data representing the representation of the virtual object vO have been generated and stored in the first memory location. And finally (cf. right portion of Fig. 3), all data representing the representation of the virtual object vO are generated and stored in said first memory location - and thus, the augmented frame F3 is generated from frames FI and F2.
  • the invention makes possible intuitive and simple-to-use virtual object modifications and uses of the generated augmented video stream. Aspects of the embodiments have been described in terms of functional units. As is readily understood, these functional units may generally be realized in virtually any number of hardware and/or software components adapted to performing the specified functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Le procédé de génération d'un flux vidéo enrichi (S3), comprend la fourniture d'une caméra vidéo réelle (C) ; la génération d'un premier flux vidéo (S1) comprenant des données de flux vidéo en temps réel provenant de la caméra vidéo réelle (C) ; la fourniture d'un objet réel (rO) ; la détermination en temps réel de données de pose relative (A) indiquant une position relative dans l'espace et une orientation relative dans l'espace de la caméra vidéo réelle (C) et de l'objet réel (rO) ; la génération d'un second flux vidéo (S2) comprenant une représentation d'un objet virtuel (vO) ; la modulation de la représentation de l'objet virtuel (vO) dans le second flux vidéo (V2) en temps réel et en fonction des données de pose relative (A) ; et l'émission et/ou la génération du flux vidéo enrichi (S3) à partir du premier flux vidéo (SI) et du second flux vidéo (S2) en temps réel. De cette manière, une pose de l'objet virtuel (vO) peut être commandée en déplaçant l'objet réel (rO).
PCT/EP2022/070186 2021-07-19 2022-07-19 Caméra virtuelle à réalité augmentée WO2023001816A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163223182P 2021-07-19 2021-07-19
US63/223,182 2021-07-19

Publications (1)

Publication Number Publication Date
WO2023001816A1 true WO2023001816A1 (fr) 2023-01-26

Family

ID=82899100

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/070186 WO2023001816A1 (fr) 2021-07-19 2022-07-19 Caméra virtuelle à réalité augmentée

Country Status (1)

Country Link
WO (1) WO2023001816A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、***及存储介质

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、***及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ADRIAN DAVID CHEOK ET AL: "Human Pacman: a mobile, wide-area entertainment system based on physical, social, and ubiquitous computing", PERSONAL AND UBIQUITOUS COMPUTING, SPRINGER VERLAG, LONDON, GB, vol. 8, no. 2, 1 May 2004 (2004-05-01), pages 71 - 81, XP058126351, ISSN: 1617-4909, DOI: 10.1007/S00779-004-0267-X *
ANDERS HENRYSSON ET AL: "Mobile phone based AR scene assembly", MOBILE AND UBIQUITOUS MULTIMEDIA, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 8 December 2005 (2005-12-08), pages 95 - 102, XP058363160, ISBN: 978-0-473-10658-4, DOI: 10.1145/1149488.1149504 *
ZHANPENG HUANG ET AL: "Mobile augmented reality survey: a bottom-up approach", 17 September 2013 (2013-09-17), XP055134545, Retrieved from the Internet <URL:http://arxiv.org/abs/1309.4413> [retrieved on 20190726] *

Similar Documents

Publication Publication Date Title
US10452133B2 (en) Interacting with an environment using a parent device and at least one companion device
EP2956843B1 (fr) Sélection de région et de volume sur la base de gestes du corps humain pour visiocasque
WO2017075932A1 (fr) Procédé et système de commande par gestes basés sur un affichage en trois dimensions
US9495802B2 (en) Position identification method and system
CN102959616B (zh) 自然交互的交互真实性增强
US10928914B2 (en) Individually interactive multi-view display system for non-stationary viewing locations and methods therefor
US8007110B2 (en) Projector system employing depth perception to detect speaker position and gestures
JP6372487B2 (ja) 情報処理装置、制御方法、プログラム、および記憶媒体
US20140176591A1 (en) Low-latency fusing of color image data
US20210035346A1 (en) Multi-Plane Model Animation Interaction Method, Apparatus And Device For Augmented Reality, And Storage Medium
US9501810B2 (en) Creating a virtual environment for touchless interaction
CN104871214A (zh) 用于具扩增实境能力的装置的用户接口
JP2015212849A (ja) 画像処理装置、画像処理方法および画像処理プログラム
TW201324235A (zh) 手勢輸入的方法及系統
CN104081307A (zh) 图像处理装置、图像处理方法和程序
EP3172721B1 (fr) Procédé et système pour augmenter une expérience de visualisation de télévision
US9740294B2 (en) Display apparatus and method for controlling display apparatus thereof
Zheng Spatio-temporal registration in augmented reality
CN105912101B (zh) 一种投影控制方法和电子设备
KR20160072306A (ko) 스마트 펜 기반의 콘텐츠 증강 방법 및 시스템
CN113678173A (zh) 用于虚拟对象的基于图绘的放置的方法和设备
CN108401452B (zh) 使用虚拟现实头戴式显示器***来执行真实目标检测和控制的装置和方法
WO2023001816A1 (fr) Caméra virtuelle à réalité augmentée
US20220191542A1 (en) Object Pose Estimation and Tracking Using Machine Learning
KR20210048928A (ko) 가상 객체를 구현할 위치를 결정하는 방법 및 이를 수행하는 단말

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22754362

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE