US20210247846A1 - Gesture tracking for mobile rendered augmented reality - Google Patents

Gesture tracking for mobile rendered augmented reality Download PDF

Info

Publication number
US20210247846A1
US20210247846A1 US17/170,255 US202117170255A US2021247846A1 US 20210247846 A1 US20210247846 A1 US 20210247846A1 US 202117170255 A US202117170255 A US 202117170255A US 2021247846 A1 US2021247846 A1 US 2021247846A1
Authority
US
United States
Prior art keywords
hand
machine learning
learning model
image
formation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/170,255
Inventor
Ketaki Lalitha Uthra Shriram
Jhanvi Samyukta Lakshmi Shriram
Yusuf Olanrewaju Olokoba
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Krikey Inc
Original Assignee
Krikey Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Krikey Inc filed Critical Krikey Inc
Priority to US17/170,255 priority Critical patent/US20210247846A1/en
Assigned to Krikey, Inc. reassignment Krikey, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OLOKOBA, YUSUF OLANREWAJU, SHRIRAM, KETAKI LALITHA UTHRA, SHRIRAM, JHANVI SAMYUKTA LAKSHMI
Publication of US20210247846A1 publication Critical patent/US20210247846A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/537Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06K9/6267
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/213Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/25Output arrangements for video game devices
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/65Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor automatically by game devices or servers from real world data, e.g. measurement in live racing competition
    • A63F13/655Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor automatically by game devices or servers from real world data, e.g. measurement in live racing competition by importing photos, e.g. of the player
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/80Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
    • A63F2300/8082Virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/21Collision detection, intersection

Definitions

  • the disclosure generally relates to the field of mobile rendered augmented reality and more specifically to object control based on gesture tracking in mobile rendered augmented reality environments.
  • AR augmented reality
  • Conventional augmented reality (AR) systems use handheld controllers to track a user's hands and determine whether the user is making a hand gesture.
  • the tracked gestures are limited to what gestures can be made with a controller in the user's hands.
  • the user cannot make a “five” gesture (i.e., palm open and fingers/thumb extended) without dropping the controller.
  • Mobile devices may execute AR applications without handheld controllers, but execution of the applications slows. This is because a mobile device has limited processing bandwidth and/or limited battery life necessary for intensive processing.
  • there lacks an AR system that can track user gestures without demanding large processing bandwidth or excessively consuming a device's powers.
  • FIG. 1 illustrates an augmented reality (AR) system environment, in accordance with at least one embodiment.
  • AR augmented reality
  • FIG. 2 is a block diagram of the gesture tracking application of FIG. 1 , in accordance with at least one embodiment.
  • FIG. 3 is a flowchart illustrating a process for controlling an AR object using gesture detection, in accordance with at least one embodiment.
  • FIG. 4 is a flowchart illustrating a process for controlling the AR object using gesture detection based on the process of FIG. 3 , in accordance with at least one embodiment.
  • FIGS. 5A and 5B illustrate user interactions with an AR application that integrates gesture tracking to control AR objects, in accordance with at least one embodiment.
  • FIG. 6 illustrates a block diagram including components of a machine able to read instructions from a machine-readable medium and execute them in a processor (or controller), in accordance with at least one embodiment.
  • an augmented reality (AR) object is rendered on a mobile client based on a user's gestures identified within a camera view of the mobile client.
  • AR augmented reality
  • Conventional gesture tracking solutions for AR are designed for handheld hardware to track where a user's hands are and determine if a gesture is being made. Accordingly, described is a configuration that enables gesture tracking to control an AR object in a mobile rendered AR system while optimizing for the power and processing constraints of the mobile client.
  • a camera coupled with the mobile client captures a camera view of an environment.
  • the environment may correspond to the physical world, which may include a portion of a user's body, that may be positioned with a field of view of the camera.
  • a processor e.g., of the mobile device
  • the processor receives an image from the camera view of the environment and applies a machine learning model to the received image.
  • the processor identifies a gesture, which may also be referred to herein as a “formation,” made by the hand depicted within the image.
  • the processor determines, based on the identified gesture, a state in which to render an AR object (e.g., from an AR engine).
  • the processor provides for display (e.g., transmits instructions (e.g., program code or software) to render on a screen) the rendered object in the environment (e.g., with the user's hand) to the mobile client.
  • a user interacts with the displayed AR objects.
  • the processor may determine not to store the image into memory or remove the image from memory such that the processor will not expend further processing resources on the frame (e.g., due to its lack of depicting an informative object).
  • the machine learning model for detecting gestures may require more processing resources than a machine learning model for detecting hands. In these circumstances, the processor may optimize the mobile client's processing resources by applying a hand detection model to an image first rather than directly applying a gesture tracking model.
  • Gesture tracking allows the user to use a mobile device to control and interact with AR objects without dedicated hardware as though they are interacting with reality around the user, presenting an immersive gaming experience for the user.
  • the methods described herein allow for AR object control using gesture tracking on a mobile client that does not consume too much processing and/or battery power.
  • FIG. 1 illustrates an augmented reality (AR) system environment, in accordance with at least one embodiment.
  • the AR system environment enables AR applications on a mobile client 100 , and in some embodiments, presents immersive experiences to users via gesture tracking.
  • the system environment includes a mobile client 100 , an AR system 110 , an AR engine 120 , a gesture tracking application 130 , a database 140 , and a network 150 .
  • the AR system 110 in some example embodiments, may include the mobile client 100 , the AR engine 120 , the gesture tracking application 130 , and the database 140 .
  • the AR system 110 may include the AR engine 120 , the gesture tracking application 130 , and the database 140 , but not the mobile client 100 , such that the AR system 110 communicatively couples (e.g., wireless communication) to the mobile client 100 from a remote server.
  • the mobile client 100 is a mobile device that is or incorporates a computer.
  • the mobile client may be, for example, a relatively small computing device in which network, processing (e.g., processor and/or controller) and power resources (e.g., battery) may be limited and have a formfactor size such as a smartphone, tablet, wearable device (e.g., smartwatch) and/or a portable internet enabled device.
  • network processing (e.g., processor and/or controller) and power resources (e.g., battery)
  • power resources e.g., battery
  • the limitations of such device extend from scientific principles that must be adhered to in designing such products for portability and use away from constant power draw sources.
  • the mobile client 100 may be a computing device that includes the components of the machine depicted in FIG. 6 .
  • the mobile client 100 has general and/or special purpose processors, memory, storage, networking components (either wired or wireless).
  • the mobile client 100 can communicate over one or more communication connections (e.g., a wired connection such as ethernet or a wireless communication via cellular signal (e.g., LTE, 5G), WiFi, satellite) and includes a global positioning system (GPS) used to determine a location of the mobile client 100 .
  • a wired connection such as ethernet
  • a wireless communication via cellular signal e.g., LTE, 5G), WiFi, satellite
  • GPS global positioning system
  • the mobile client 100 also includes one or more cameras 102 that can capture forward and rear facing images and/or videos.
  • the camera 102 may be a two-dimensional (2D) camera as opposed to a stereo camera or a three-dimensional (3D) camera. That is, the machine-learned detection described herein does not necessarily require a 3D image or depth to classify a hand or a gesture depicted within an image.
  • the mobile client 100 also includes a screen (or display) 103 and a display driver to provide for display interfaces on the screen 103 associated with the mobile client 100 .
  • the mobile client 100 executes an operating system, such as GOOGLE ANDROID OS and/or APPLE iOS, and includes the screen 103 and/or a user interface that the user can interact with.
  • the mobile client 100 couples to the AR system 110 , which enables it to execute an AR application (e.g., the AR client 101 ).
  • the AR engine 120 interacts with the mobile client 100 to execute the AR client 101 (e.g., an AR game).
  • the AR engine 120 may be a game engine such as UNITY and/or UNREAL ENGINE.
  • the AR engine 120 displays, and the user interacts with, the AR game via the mobile client 100 .
  • the mobile client 100 may host and execute the AR client 101 that in turn accesses the AR engine 120 to enable the user to interact with the AR game.
  • the AR application refers to an AR gaming application in many instances described herein, the AR application may be a retail application integrating AR for modeling purchasable products, an educational application integrating AR for demonstrating concepts within a learning curriculum, or any suitable interactive application in which AR may be used to augment the interactions.
  • the AR engine 120 is integrated into and/or hosted on the mobile client 100 . In other embodiments, the AR engine 120 is hosted external to the mobile client 100 and communicatively couples to the mobile client 100 over the network 150 .
  • the AR system 110 may comprise program code that executes functions as described herein.
  • the AR system 110 includes the gesture tracking application 130 .
  • the gesture tracking application enables gesture tracking in the AR game such that AR objects (e.g., virtual objects rendered by the AR engine 120 ) and their behaviors or states may be controlled by the user.
  • the user may capture an image and/or video of an environment captured within a camera view of the camera 102 of the mobile client 100 .
  • An image from the camera view may depict a portion of the user's body such as the user's hand.
  • the AR engine 120 renders an AR object, where the rendering may be based on gestures that the gesture tracking application 130 has determined that the user is performing (e.g., a fist).
  • the gesture tracking application 130 may detect or track (e.g., detecting changes in gestures over time) a variety of gestures such as a five (i.e., an open palm facing away from the user), a fist, pointing, waving, facial gestures (e.g., smiles, open mouth, etc.), lifting a leg, bending, kicking, or any suitable movement made by any portion of the body to express an intention. While the gesture tracking application 130 is described herein as primarily tracking hand gestures, the gesture tracking application 130 may detect various gestures as described above. As referred to herein, “gesture” and “formation” may be used interchangeably.
  • the gesture tracking application 130 identifies a body part (e.g., a hand) within an image from a camera view captured by the camera 102 , determines a gesture made by the body part (e.g., a five), and renders an AR object based on the determined gesture.
  • the gesture tracking application 130 may instruct the camera 102 to capture image frames periodically (e.g., every three seconds).
  • the state in which the AR engine 120 displays the AR object depends on input from the user during game play. For example, the direction of the AR object's movement may change depending on the user's gestures. FIGS.
  • the AR system 110 includes applications instead of and/or in addition to the gesture tracking application 130 .
  • the gesture tracking application 130 may be hosted on and/or executed by the mobile client 100 .
  • the gesture tracking application 130 is communicatively coupled to the mobile client 100 .
  • the database 140 stores images or videos that may be used by the gesture tracking application 130 to detect a user's hand and determine the gesture the hand is making.
  • the mobile client 100 may transmit images or videos collected by the camera 102 during the execution of the AR client 101 to the database 140 .
  • the data stored within the database 140 may be collected from a single user (e.g., the user of the mobile client 100 ) or multiple users (e.g., users of other mobile clients that are communicatively coupled to the AR system 110 through the network 150 ).
  • the gesture tracking application 130 may use images and/or videos of gestures stored in the database 140 to train a model (e.g., a neural network).
  • the machine learning model training engine 210 of the gesture tracking application 130 may access the database 140 to train a machine learning model. This is described in further detail in the description of FIG. 2 .
  • the database 140 may store a mapping of gestures to AR objects and/or states in which the AR objects may be rendered.
  • the gesture tracking application 130 may determine a gesture made by a user's hand as captured within a camera view of the mobile client 100 and access the database 140 to determine, using the determined gesture, that an AR object should be rendered in a particular state (e.g., a AR ball floating upward).
  • the database 140 may store one or more user profiles, each user profile including user customizations or settings that personalize the user's experience using the AR client 101 .
  • a user profile stored within the database 140 may store a user-specified name of a custom hand gesture, images of the customized hand gesture (e.g., taken by the user using the mobile client 100 ), and a user-specified mapping of a gesture to the customized hand gesture.
  • the network 150 transmits data between the mobile client 100 and the AR system 110 .
  • the network 150 may be a local area and/or wide area network that uses wired and/or wireless communication systems, such as the internet.
  • the network 150 includes encryption capabilities to ensure the security of data, such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), internet protocol security (IPsec), etc.
  • SSL secure sockets layer
  • TLS transport layer security
  • VPNs virtual private networks
  • IPsec internet protocol security
  • FIG. 2 is a block diagram of the gesture tracking application 130 of FIG. 1 , in accordance with at least one example embodiment.
  • the gesture tracking application 130 includes a machine learning model training engine 210 , a gesture tracking module 220 , an AR object state module 230 , and a rendering module 240 .
  • the gesture tracking module 220 further includes, a gesture detection model 221 and a hand detection model 222 .
  • the gesture tracking application 130 includes modules other than those shown in FIG. 2 .
  • the modules may be embodied as program code (e.g., software comprised of instructions stored on non-transitory computer readable storage medium and executable by at least one processor such as the processor 602 in FIG. 6 ) and/or hardware (e.g., application specific integrated circuit (ASIC) chips or field programmable gate arrays (FPGA) with firmware.
  • the modules correspond to at least having the functionality described when executed/operated.
  • the process of detecting a gesture and modifying the state of an AR object based on the detected gesture may begin with the gesture tracking module 220 receiving an image from the mobile client 100 .
  • the image may be taken by camera 102 and transmitted to the gesture tracking application 130 by the AR client 101 to determine whether a gesture is depicted within the image to control an AR object.
  • the gesture tracking module 220 may apply one or more of trained models such as the gesture detection model 221 and the hand detection model 222 to determine the gesture depicted in the received image.
  • the models may be trained by the machine learning model training engine 210 .
  • the gesture tracking module 220 may provide the classification to the AR object state module 230 , which subsequently determines a state in which an AR object should be rendered.
  • the AR object state module 230 provides the determined state to the rendering module 240 , which may request an AR object in a particular state from the AR engine 120 and provide the AR object received from the AR engine 120 at the screen 103 of the mobile client 100 .
  • the machine learning model training engine 210 applies training data sets to the gesture detection model 221 or the hand detection model 222 .
  • the training engine 210 may create training data sets based on data from the database 140 .
  • the training data sets may include positive or negative samples of hand gestures or hands.
  • the training data sets may be labeled according to the presence, or lack thereof, of a hand gesture or hand.
  • the labels may be provided to the training engine 210 from a user (e.g., using a user input interface of the mobile client 100 ). This may enable the training engine 210 to train, for example, the gesture detection model 221 to classify an image of a custom user gesture according to a user-specified label.
  • the machine learning model training engine 210 may create a training data set using images of a custom gesture.
  • the gesture tracking module 220 may prompt (e.g., through the screen 103 ) the user to make a custom gesture and instruct the user to capture multiple images (e.g., in different positions or angles) of the gesture using the camera 102 .
  • the gesture tracking module 220 receives the captured images and transmit them to the database 140 for storage.
  • the machine learning model training engine 210 may use the captured images to train the gesture detection model 221 or the hand detection model 222 .
  • the machine learning model training engine 210 may train a machine learning model in multiple stages. In a first stage, the training engine 210 may use generalized data representing hands or hand gestures taken from multiple users. In a second stage, the training engine 210 may use user-specific data representing the hand or hand gestures of the user of mobile client 100 to further optimize the gesture tracking performed by the gesture tracking module 220 to a user.
  • the gesture detection model 221 and the hand detection model 222 may be machine learning models.
  • the models 221 and 222 may use various machine learning techniques such as linear support vector machine (linear SVM), boosting for other algorithms (e.g., AdaBoost), neural networks, logistic regression, na ⁇ ve Bayes, memory-based learning, random forests, bagged trees, decision trees, boosted trees, boosted stumps, a supervised or unsupervised learning algorithm, or any suitable combination thereof.
  • linear SVM linear support vector machine
  • AdaBoost boosting for other algorithms
  • neural networks logistic regression, na ⁇ ve Bayes, memory-based learning, random forests, bagged trees, decision trees, boosted trees, boosted stumps, a supervised or unsupervised learning algorithm, or any suitable combination thereof.
  • the gesture detection model 221 classifies gestures within images or videos.
  • the gesture detection model 221 is trained using various images of one or more hand gestures (e.g., by machine learning model training engine 210 ).
  • data representing an image or video captured by the camera 102 is input into the gesture detection model 221 .
  • the gesture detection model 221 classifies one or more objects within the image or video. For example, the gesture detection model 221 classifies a hand making a five within an image whose data is input into the model.
  • the gesture detection model 221 detects a gesture performed over time (e.g., a hand wave). For example, consecutively captured images are input into the gesture detection model 221 to determine that the images represent a five at varying positions within the camera view, which the gesture tracking module 220 may then classify as a hand wave.
  • the hand detection model 222 classifies the presence of hands within images or videos.
  • the hand detection model 222 is trained using various images of hands (e.g., by machine learning model training engine 210 ). Similar to the gesture detection model 221 , data representing an image or video captured by the camera 102 may be input into the hand detection model 222 .
  • the hand detection model 222 classifies an object within the image or video as a hand or outputs an indication that no hand was detected.
  • the hand detection model 222 uses clusters of feature points identified from the camera view, where the feature points are associated with a physical object in the environment and the cluster corresponds to a physical surface of the object (e.g., a hand).
  • the hand detection model 222 may use the feature points (e.g., a combination of three-dimensional Cartesian coordinates and the corresponding depths) to identify clusters corresponding to a hand or to what is not a hand.
  • the gesture tracking module 220 may use the output of the gesture detection model 221 or the hand detection model 222 to generate instructions or prompts for display to the user (e.g., via the screen 103 of the mobile client 100 ).
  • the instructions may include a suggested movement that the user should make or position that the user should be in within the camera view of the mobile client 100 to detect a hand gesture.
  • the gesture tracking module 220 may generate an outline of a hand or hand gesture on the screen 103 to guide a user to form a hand gesture.
  • the camera 102 captures images of the user's hand as it aligns with the displayed guide.
  • the captured images may be input into the gesture detection model 221 or the hand detection model 222 . If a gesture or hand is not detected within the images, the gesture tracking module 220 may generate a notification at the mobile client 100 that the detection failed, instructions to guide the user to make a desired hand gesture, or a combination thereof.
  • One or more of the gesture detection model 221 or the hand detection model 222 may output a classification based on a success threshold.
  • the models may determine that the likelihood of a classification (e.g., a confidence score) of a particular gesture or of the presence of hand must meet or exceed a success threshold. For example, the gesture detection model 221 determines that a “five” gesture is being made with 60% likelihood and determines, based on a success threshold of 90%, that there is not a “five” gesture present in the input image or video. In another example, the hand detection model 222 determines that a hand is present with 95% likelihood and determines, based on a success threshold of 80%, that there is indeed a hand present in the input image or video.
  • a success threshold e.g., a confidence score
  • the success threshold for each model of the gesture tracking module 220 may be different.
  • a success threshold may be user-specified or adjustable. For example, a user may lower the success threshold used by the models of the gesture tracking module 220 to increase the likelihood that his hand gesture will be identified. In this example, the user may increase the flexible afforded when the user's hand is not placed within the camera view at a sufficiently proper angle or position, and a model may be more likely to determine the user's gesture as one that the user is indeed making.
  • the gesture detection model 221 and the hand detection model 222 may be one model.
  • the gesture detection model 221 may be used to both detect the presence of a hand within an image and classify the gesture made by the present hand.
  • use of the hand detection model 222 may improve processing bandwidth and thus, power consumption, of the mobile client 100 by enabling the gesture tracking application 130 (e.g., the gesture tracking module 220 ) to discard image frames of videos where a hand is not present.
  • Video processing consumes a large amount of a mobile client's processing bandwidth, and by discarding frames that do not have information relevant to the gesture tracking application 130 , subsequent processing of image frames are reserved for images that contain valuable information (e.g., a hand gesture being made).
  • the application of both the hand detection model 222 and the gesture detection model 221 may reduce the processing cycles and power consumption of the mobile client 100 .
  • applying the hand detection model 222 to an image before applying the gesture detection model may save processing cycles.
  • the gesture detection model 221 may be able to classify a large number of gestures (e.g., both still and moving gestures) and thus, requires complex processing with each application to an image in order to determine which of the many gestures is present within the image.
  • the hand detection model 222 may be simpler, as detecting a hand by its outline may be simpler than determining the gesture being made. Accordingly, the initial application of the hand detection model 222 that requires less processing bandwidth prior to the subsequent application, if the hand has been detected by the model 222 , of the gesture detection model 221 that requires more processing bandwidth allows the mobile client 100 to reserve its processing resources.
  • the AR object state module 230 may hold in memory the current state of the AR object or previous states of the AR objects to determine a subsequent state based on the user's hand gesture and the current or previous states of the AR object.
  • a conditional decision algorithm may be used by the AR object state module 230 to determine a state to instruct the rendering module 240 to render the AR object in.
  • the AR Object State Module 240 may receive custom mappings of states for AR objects and hand gestures from the user. For example, a user may specify that an AR object corresponding to a shield is to be rendered during game play when a user waves his hand and that the rendering module 240 should stop rendering the shield when a user waves his hand again. States of an AR object and the application of the AR object state module 230 in determining those states using detected hand gestures is further described in the descriptions of FIGS. 4, 5A, and 5B .
  • the rendering module 240 provides for display, on the mobile client 100 , an augmented reality (AR) object that may be controlled by a user's gestures.
  • the AR engine 120 generates the AR object.
  • the rendering module 240 displays the AR object in a state based on a detected hand gesture.
  • a “state” of an AR object refers to a position, angle, shape, or any suitable condition of appearance that may change over time.
  • the rendering module 240 may render a ball in a first state where the ball is not aflame or in a second state where the ball is aflame.
  • the rendering module 240 may render the ball in a first state where the ball is located at a first set of Cartesian coordinates and a first depth in a virtual coordinate space corresponding to the environment or in a second state where the ball is located at a second set of Cartesian coordinates and a second depth in the virtual coordinate space.
  • FIG. 3 is a flowchart illustrating a process 300 for controlling an AR object using gesture detection, in accordance with at least one example embodiment.
  • the process 300 may be performed by the gesture tracking application 130 .
  • the gesture tracking application 130 may perform operations of the process 300 in parallel or in different orders, or may perform different, additional, or fewer steps.
  • the gesture tracking application 130 may generate for display on the mobile client an instruction to position a user's hand within a certain area of the camera view of the mobile client (e.g., in order to successfully capture the hand within the image).
  • the gesture tracking application 130 receives 302 , from a mobile client, an image from a camera view of an environment, where the image depicts a portion of a body of a user.
  • the gesture tracking module 220 may perform the receiving 302 .
  • the mobile client 100 provides an image of a user's hand, from the camera view captured by the camera 102 , within an environment of the user's living room.
  • the gesture tracking application 130 provides 304 the image to a machine learning model configured to identify a formation of the portion of the body.
  • the gesture tracking module 220 may apply the image data of the received 302 image to the gesture detection model 221 .
  • the gesture detection model 221 may identify a formation being made by the hand within the image and classify the formation into one of multiple potential formations (e.g., as specified by a user through labels corresponding to the potential formations that were used to train the gesture detection model 221 ).
  • the gesture tracking application 130 provides 306 for display on the mobile client, based on an identification of the formation by the machine learning model, an AR object in the camera view of the environment.
  • the rendering module 240 may perform the providing 306 .
  • the gesture tracking application 130 may determine a state that the AR object is to be rendered in. For example, the AR object state module 230 uses a conditional decision tree that indicates that, if a particular formation is identified, then the rendering module 240 is to display a corresponding AR object and in a corresponding state.
  • FIG. 4 is a flowchart illustrating a process 400 for controlling the AR object using gesture detection based on the process 300 of FIG. 3 , in accordance with at least one example embodiment.
  • the process 400 includes subprocesses of the process 300 .
  • the process 400 may be performed by the gesture tracking application 130 .
  • the gesture tracking application 130 may perform operations of the process 400 in parallel or in different orders, or may perform different, additional, or fewer steps. For example, after determining 406 what formation was identified, the gesture tracking application 130 may access a previous formation identified and determine, using both the previous and the current formation, a state in which to generate the AR object.
  • the gesture tracking application 130 receives 402 , from a mobile client, an image from a camera view of an environment, the image depicting a hand of a user.
  • the hand captured within the received 402 image is one example of a portion of the body of the user received 302 in the process 300 .
  • the gesture tracking application 130 applies 404 a machine learning model to the image, the machine learning model trained on training image data representative of hand formations, the machine learning model configured to identify a formation of the hand in the image as one of the hand formations.
  • the machine learning model applied 404 is one example of a machine learning model applied 304 in the process 300 .
  • the machine learning model may be a convolutional neural network trained (e.g., by the machine learning model training engine 210 ) using various images of hands making particular formations (e.g., fists and fives).
  • the machine learning model may output a classification that the identified formation is a first (e.g., palm covered over by fingers curled in).
  • the gesture tracking application 130 determines 406 whether a first or a second hand formation is identified by the machine learning model. The gesture tracking application may make this determination using a mapping between formations and AR object states, as described in the description of the AR object state module 230 . If a first hand formation is identified, the gesture tracking application 130 provides 408 , for display on the mobile client, the AR object in a first state from the AR engine in the camera view of the environment. If a second hand formation is identified, the gesture tracking application 130 provides 410 , for display on the mobile client, the AR object in a second state from the AR engine in the camera view of the environment.
  • the gesture tracking application may use the rendering module 240 which receives instructions to render a particular AR object in a particular state from the AR object state module 230 and transmits instructions to the AR engine 120 to render the AR object accordingly at the screen 103 of the mobile client 100 .
  • the determination 406 and either provided 408 or 410 AR object for display may be one example of the provided 306 AR object for display of the process 300 .
  • FIGS. 5A and 5B illustrate user interactions with an AR application that integrates gesture tracking to control AR objects, in accordance with at least one embodiment.
  • FIG. 5A shows a first user interaction 500 a where a user's hand 510 a is in a first state and is captured, by the mobile client 100 , within an image (e.g., camera-captured hand 510 b ).
  • the gesture tracking application 130 renders an AR object 520 (e.g., a ball) for display on the mobile client 100 such that the AR object 520 appears integrated into the environment with the user's hand 510 a captured within the camera view.
  • an AR object 520 e.g., a ball
  • the AR object 520 is rendered for display (e.g., by the rendering module 240 ) in a first state where it is at a position overlaying the user's hand.
  • the gesture tracking application 130 may determine a position within a virtual coordinate space that the user's hand 510 a is located and determine a corresponding location to render the AR object 520 (e.g., a set of coordinates causing the AR object 520 to appear above the user's hand 510 a ).
  • the gesture tracking application 130 provides a gesture indicator 530 for display that indicates the gesture identified by the gesture tracking application (e.g., by the gesture detection model 221 ).
  • the gesture indicator 530 indicates that the user's hand 510 a is making a five.
  • the gesture tracking application 130 may highlight, circle, or otherwise visually distinguish a gesture indicator from other indicators to inform the user of presently detected gestures within the camera view.
  • FIG. 5B shows a second user interaction 500 b where the user's hand 510 a is in a second state and is captured, by the mobile client 100 , within an image (e.g., camera-captured hand 510 b ).
  • the gesture tracking application 130 provides a gesture indicator 540 for display that indicates the gesture identified by the gesture detection model 221 is a first during the second user interaction 500 b .
  • the gesture tracking application 130 renders the AR object 520 for display in a second state where the object appears to move upward within the camera view, as indicated by the state change indicator 550 included in FIG. 5B for clarity and not necessarily rendered by the gesture tracking application 130 for display to the user.
  • the gesture tracking application 130 may use a previous state and the present gesture of a user's hand to determine a subsequent state in which to render an AR object (e.g., using a mapping table or conditional decision tree). For example, the gesture tracking application 130 determines that a combination of the identified first and the existing position of the AR object 520 over the user's hand indicates that the next state in which the AR object is to be rendered is appearing to move upward.
  • FIG. 6 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller).
  • FIG. 6 shows a diagrammatic representation of a machine in the example form of a computer system 600 within which program code (e.g., software) for causing the machine to perform any one or more of the methodologies discussed herein may be executed.
  • the program code may correspond to functional configuration of the modules and/or processes described with FIGS. 1-5B .
  • the program code may be comprised of instructions 624 executable by one or more processors 602 .
  • the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine may be a portable computing device or machine (e.g., smartphone, tablet, wearable device (e.g., smartwatch)) capable of executing instructions 624 (sequential or otherwise) that specify actions to be taken by that machine.
  • a portable computing device or machine e.g., smartphone, tablet, wearable device (e.g., smartwatch)
  • machine capable of executing instructions 624 (sequential or otherwise) that specify actions to be taken by that machine.
  • instructions 624 quential or otherwise
  • the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 624 to perform any one or more of the methodologies discussed herein.
  • the example computer system 600 includes a processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 604 , and a static memory 606 , which are configured to communicate with each other via a bus 608 .
  • the computer system 600 may further include visual display interface 610 .
  • the visual interface may include a software driver that enables displaying user interfaces on a screen (or display).
  • the visual interface may display user interfaces directly (e.g., on the screen) or indirectly on a surface, window, or the like (e.g., via a visual projection unit). For ease of discussion the visual interface may be described as a screen.
  • the visual interface 610 may include or may interface with a touch enabled screen.
  • the computer system 600 may also include alphanumeric input device 612 (e.g., a keyboard or touch screen keyboard), a cursor control device 614 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 616 , a signal generation device 618 (e.g., a speaker), and a network interface device 620 , which also are configured to communicate via the bus 608 .
  • alphanumeric input device 612 e.g., a keyboard or touch screen keyboard
  • a cursor control device 614 e.g., a mouse, a trackball, a joystick, a motion sensor,
  • the storage unit 616 includes a machine-readable medium 622 on which is stored instructions 624 (e.g., software) embodying any one or more of the methodologies or functions described herein.
  • the instructions 624 (e.g., software) may also reside, completely or at least partially, within the main memory 604 or within the processor 602 (e.g., within a processor's cache memory) during execution thereof by the computer system 600 , the main memory 604 and the processor 602 also constituting machine-readable media.
  • the instructions 624 (e.g., software) may be transmitted or received over a network 626 via the network interface device 620 .
  • machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 624 ).
  • the term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 624 ) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein.
  • the term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
  • a user may desire a more seamless experience by controlling AR objects with his hands (e.g., using hand gestures), as the user is accustomed to doing in reality.
  • tracking hand gestures for conventional AR systems achieve the user's desire with dedicated, handheld controllers, a wired power source, and enough hardware real-estate to accommodate for powerful, power-hungry processors
  • mobile clients do not share those similar specifications to afford gesture tracking in that conventional manner. Rather, a mobile device is limited in its power and processing resources.
  • the embodiments herein optimize for a mobile device's power and processing constraints by limiting image frames processed during the gesture detection (e.g., discarding image frames that a machine learning model determines does not depict a hand).
  • the methods described herein enable gesture tracking for AR object control on mobile client rendered AR systems without consuming excessive amounts of processing power and present an immersive AR experience to the user.
  • Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules.
  • a hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner.
  • one or more computer systems e.g., a standalone, client or server computer system
  • one or more hardware modules of a computer system e.g., a processor or a group of processors
  • software e.g., an application or application portion
  • a hardware module may be implemented mechanically or electronically.
  • a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations.
  • a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
  • hardware module should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
  • “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
  • Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
  • a resource e.g., a collection of information
  • processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
  • the modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
  • the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
  • the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
  • SaaS software as a service
  • the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
  • the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
  • any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
  • Coupled and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
  • the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
  • a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
  • “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Optics & Photonics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Processing Or Creating Images (AREA)

Abstract

An augmented reality (AR) system hosted and executed on a mobile client enables control of AR objects using gestures. The system receives, from the mobile client, an image from a camera view (e.g., the mobile client's camera) of an environment, where the image depicts a user's hand. The system applies a machine learning model to the received image. The machine learning model identifies a formation of the hand. The system determines to render an AR object based on the identified formation. For example, a user forming a first with his hand may cause an AR ball to move upward within the screen of the mobile client.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Application No. 62/971,766, filed Feb. 7, 2020, which is incorporated by reference in its entirety.
  • TECHNICAL FIELD
  • The disclosure generally relates to the field of mobile rendered augmented reality and more specifically to object control based on gesture tracking in mobile rendered augmented reality environments.
  • BACKGROUND
  • Conventional augmented reality (AR) systems use handheld controllers to track a user's hands and determine whether the user is making a hand gesture. However, the tracked gestures are limited to what gestures can be made with a controller in the user's hands. For example, the user cannot make a “five” gesture (i.e., palm open and fingers/thumb extended) without dropping the controller. Mobile devices may execute AR applications without handheld controllers, but execution of the applications slows. This is because a mobile device has limited processing bandwidth and/or limited battery life necessary for intensive processing. Hence, there lacks an AR system that can track user gestures without demanding large processing bandwidth or excessively consuming a device's powers.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
  • Figure (FIG. 1 illustrates an augmented reality (AR) system environment, in accordance with at least one embodiment.
  • FIG. 2 is a block diagram of the gesture tracking application of FIG. 1, in accordance with at least one embodiment.
  • FIG. 3 is a flowchart illustrating a process for controlling an AR object using gesture detection, in accordance with at least one embodiment.
  • FIG. 4 is a flowchart illustrating a process for controlling the AR object using gesture detection based on the process of FIG. 3, in accordance with at least one embodiment.
  • FIGS. 5A and 5B illustrate user interactions with an AR application that integrates gesture tracking to control AR objects, in accordance with at least one embodiment.
  • FIG. 6 illustrates a block diagram including components of a machine able to read instructions from a machine-readable medium and execute them in a processor (or controller), in accordance with at least one embodiment.
  • DETAILED DESCRIPTION
  • The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
  • Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
  • Configuration Overview
  • In one example embodiment of a disclosed system, method and computer readable storage medium, an augmented reality (AR) object is rendered on a mobile client based on a user's gestures identified within a camera view of the mobile client. Conventional gesture tracking solutions for AR are designed for handheld hardware to track where a user's hands are and determine if a gesture is being made. Accordingly, described is a configuration that enables gesture tracking to control an AR object in a mobile rendered AR system while optimizing for the power and processing constraints of the mobile client.
  • In one example configuration, a camera coupled with the mobile client (e.g., integrated with the mobile client or wirelessly or wired connection with the mobile client) captures a camera view of an environment. The environment may correspond to the physical world, which may include a portion of a user's body, that may be positioned with a field of view of the camera. A processor (e.g., of the mobile device) processes program code that causes the processor to execute specified functions as are further described herein. Accordingly, the processor receives an image from the camera view of the environment and applies a machine learning model to the received image. Using the machine learning, the processor identifies a gesture, which may also be referred to herein as a “formation,” made by the hand depicted within the image. The processor determines, based on the identified gesture, a state in which to render an AR object (e.g., from an AR engine). The processor provides for display (e.g., transmits instructions (e.g., program code or software) to render on a screen) the rendered object in the environment (e.g., with the user's hand) to the mobile client. In some embodiments, a user interacts with the displayed AR objects.
  • In some embodiments, if the processor does not detect a gesture or a hand within a received image, the processor may determine not to store the image into memory or remove the image from memory such that the processor will not expend further processing resources on the frame (e.g., due to its lack of depicting an informative object). In some embodiments, the machine learning model for detecting gestures may require more processing resources than a machine learning model for detecting hands. In these circumstances, the processor may optimize the mobile client's processing resources by applying a hand detection model to an image first rather than directly applying a gesture tracking model.
  • Gesture tracking allows the user to use a mobile device to control and interact with AR objects without dedicated hardware as though they are interacting with reality around the user, presenting an immersive gaming experience for the user. In particular, the methods described herein allow for AR object control using gesture tracking on a mobile client that does not consume too much processing and/or battery power.
  • Augmented Reality System Environment
  • Figure (FIG. 1 illustrates an augmented reality (AR) system environment, in accordance with at least one embodiment. The AR system environment enables AR applications on a mobile client 100, and in some embodiments, presents immersive experiences to users via gesture tracking. The system environment includes a mobile client 100, an AR system 110, an AR engine 120, a gesture tracking application 130, a database 140, and a network 150. The AR system 110, in some example embodiments, may include the mobile client 100, the AR engine 120, the gesture tracking application 130, and the database 140. In other example embodiments, the AR system 110 may include the AR engine 120, the gesture tracking application 130, and the database 140, but not the mobile client 100, such that the AR system 110 communicatively couples (e.g., wireless communication) to the mobile client 100 from a remote server.
  • The mobile client 100 is a mobile device that is or incorporates a computer. The mobile client may be, for example, a relatively small computing device in which network, processing (e.g., processor and/or controller) and power resources (e.g., battery) may be limited and have a formfactor size such as a smartphone, tablet, wearable device (e.g., smartwatch) and/or a portable internet enabled device. The limitations of such device extend from scientific principles that must be adhered to in designing such products for portability and use away from constant power draw sources.
  • The mobile client 100 may be a computing device that includes the components of the machine depicted in FIG. 6. The mobile client 100 has general and/or special purpose processors, memory, storage, networking components (either wired or wireless). The mobile client 100 can communicate over one or more communication connections (e.g., a wired connection such as ethernet or a wireless communication via cellular signal (e.g., LTE, 5G), WiFi, satellite) and includes a global positioning system (GPS) used to determine a location of the mobile client 100.
  • The mobile client 100 also includes one or more cameras 102 that can capture forward and rear facing images and/or videos. To capture images for gesture or hand detection, the camera 102 may be a two-dimensional (2D) camera as opposed to a stereo camera or a three-dimensional (3D) camera. That is, the machine-learned detection described herein does not necessarily require a 3D image or depth to classify a hand or a gesture depicted within an image.
  • The mobile client 100 also includes a screen (or display) 103 and a display driver to provide for display interfaces on the screen 103 associated with the mobile client 100. The mobile client 100 executes an operating system, such as GOOGLE ANDROID OS and/or APPLE iOS, and includes the screen 103 and/or a user interface that the user can interact with. In some embodiments, the mobile client 100 couples to the AR system 110, which enables it to execute an AR application (e.g., the AR client 101).
  • The AR engine 120 interacts with the mobile client 100 to execute the AR client 101 (e.g., an AR game). For example, the AR engine 120 may be a game engine such as UNITY and/or UNREAL ENGINE. The AR engine 120 displays, and the user interacts with, the AR game via the mobile client 100. For example, the mobile client 100 may host and execute the AR client 101 that in turn accesses the AR engine 120 to enable the user to interact with the AR game. Although the AR application refers to an AR gaming application in many instances described herein, the AR application may be a retail application integrating AR for modeling purchasable products, an educational application integrating AR for demonstrating concepts within a learning curriculum, or any suitable interactive application in which AR may be used to augment the interactions. In some embodiments, the AR engine 120 is integrated into and/or hosted on the mobile client 100. In other embodiments, the AR engine 120 is hosted external to the mobile client 100 and communicatively couples to the mobile client 100 over the network 150. The AR system 110 may comprise program code that executes functions as described herein.
  • In some example embodiments, the AR system 110 includes the gesture tracking application 130. The gesture tracking application enables gesture tracking in the AR game such that AR objects (e.g., virtual objects rendered by the AR engine 120) and their behaviors or states may be controlled by the user. The user may capture an image and/or video of an environment captured within a camera view of the camera 102 of the mobile client 100. An image from the camera view may depict a portion of the user's body such as the user's hand. The AR engine 120 renders an AR object, where the rendering may be based on gestures that the gesture tracking application 130 has determined that the user is performing (e.g., a fist). The gesture tracking application 130 may detect or track (e.g., detecting changes in gestures over time) a variety of gestures such as a five (i.e., an open palm facing away from the user), a fist, pointing, waving, facial gestures (e.g., smiles, open mouth, etc.), lifting a leg, bending, kicking, or any suitable movement made by any portion of the body to express an intention. While the gesture tracking application 130 is described herein as primarily tracking hand gestures, the gesture tracking application 130 may detect various gestures as described above. As referred to herein, “gesture” and “formation” may be used interchangeably.
  • During use of the AR client 101 (e.g., during game play), the gesture tracking application 130 identifies a body part (e.g., a hand) within an image from a camera view captured by the camera 102, determines a gesture made by the body part (e.g., a five), and renders an AR object based on the determined gesture. The gesture tracking application 130 may instruct the camera 102 to capture image frames periodically (e.g., every three seconds). In some embodiments, the state in which the AR engine 120 displays the AR object depends on input from the user during game play. For example, the direction of the AR object's movement may change depending on the user's gestures. FIGS. 5A and 5B, described further herein, provide details on how gesture tracking may be used control AR objects in the AR system 110. In some embodiments, the AR system 110 includes applications instead of and/or in addition to the gesture tracking application 130. In some embodiments, the gesture tracking application 130 may be hosted on and/or executed by the mobile client 100. In other embodiments, the gesture tracking application 130 is communicatively coupled to the mobile client 100.
  • The database 140 stores images or videos that may be used by the gesture tracking application 130 to detect a user's hand and determine the gesture the hand is making. The mobile client 100 may transmit images or videos collected by the camera 102 during the execution of the AR client 101 to the database 140. The data stored within the database 140 may be collected from a single user (e.g., the user of the mobile client 100) or multiple users (e.g., users of other mobile clients that are communicatively coupled to the AR system 110 through the network 150). The gesture tracking application 130 may use images and/or videos of gestures stored in the database 140 to train a model (e.g., a neural network). In particular, the machine learning model training engine 210 of the gesture tracking application 130 may access the database 140 to train a machine learning model. This is described in further detail in the description of FIG. 2.
  • The database 140 may store a mapping of gestures to AR objects and/or states in which the AR objects may be rendered. The gesture tracking application 130 may determine a gesture made by a user's hand as captured within a camera view of the mobile client 100 and access the database 140 to determine, using the determined gesture, that an AR object should be rendered in a particular state (e.g., a AR ball floating upward). The database 140 may store one or more user profiles, each user profile including user customizations or settings that personalize the user's experience using the AR client 101. For example, a user profile stored within the database 140 may store a user-specified name of a custom hand gesture, images of the customized hand gesture (e.g., taken by the user using the mobile client 100), and a user-specified mapping of a gesture to the customized hand gesture.
  • The network 150 transmits data between the mobile client 100 and the AR system 110. The network 150 may be a local area and/or wide area network that uses wired and/or wireless communication systems, such as the internet. In some embodiments, the network 150 includes encryption capabilities to ensure the security of data, such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), internet protocol security (IPsec), etc.
  • Example Gesture Tracking Application
  • FIG. 2 is a block diagram of the gesture tracking application 130 of FIG. 1, in accordance with at least one example embodiment. The gesture tracking application 130 includes a machine learning model training engine 210, a gesture tracking module 220, an AR object state module 230, and a rendering module 240. The gesture tracking module 220 further includes, a gesture detection model 221 and a hand detection model 222. In some embodiments, the gesture tracking application 130 includes modules other than those shown in FIG. 2. The modules may be embodied as program code (e.g., software comprised of instructions stored on non-transitory computer readable storage medium and executable by at least one processor such as the processor 602 in FIG. 6) and/or hardware (e.g., application specific integrated circuit (ASIC) chips or field programmable gate arrays (FPGA) with firmware. The modules correspond to at least having the functionality described when executed/operated.
  • The process of detecting a gesture and modifying the state of an AR object based on the detected gesture may begin with the gesture tracking module 220 receiving an image from the mobile client 100. In one example, the image may be taken by camera 102 and transmitted to the gesture tracking application 130 by the AR client 101 to determine whether a gesture is depicted within the image to control an AR object. The gesture tracking module 220 may apply one or more of trained models such as the gesture detection model 221 and the hand detection model 222 to determine the gesture depicted in the received image. The models may be trained by the machine learning model training engine 210. After classifying the gesture (e.g., determining the user is making a five), the gesture tracking module 220 may provide the classification to the AR object state module 230, which subsequently determines a state in which an AR object should be rendered. The AR object state module 230 provides the determined state to the rendering module 240, which may request an AR object in a particular state from the AR engine 120 and provide the AR object received from the AR engine 120 at the screen 103 of the mobile client 100.
  • The machine learning model training engine 210 applies training data sets to the gesture detection model 221 or the hand detection model 222. The training engine 210 may create training data sets based on data from the database 140. The training data sets may include positive or negative samples of hand gestures or hands. The training data sets may be labeled according to the presence, or lack thereof, of a hand gesture or hand. The labels may be provided to the training engine 210 from a user (e.g., using a user input interface of the mobile client 100). This may enable the training engine 210 to train, for example, the gesture detection model 221 to classify an image of a custom user gesture according to a user-specified label.
  • For the gesture detection model 221 to classify customized gestures, the machine learning model training engine 210 may create a training data set using images of a custom gesture. The gesture tracking module 220 may prompt (e.g., through the screen 103) the user to make a custom gesture and instruct the user to capture multiple images (e.g., in different positions or angles) of the gesture using the camera 102. The gesture tracking module 220 receives the captured images and transmit them to the database 140 for storage. The machine learning model training engine 210 may use the captured images to train the gesture detection model 221 or the hand detection model 222.
  • The machine learning model training engine 210 may train a machine learning model in multiple stages. In a first stage, the training engine 210 may use generalized data representing hands or hand gestures taken from multiple users. In a second stage, the training engine 210 may use user-specific data representing the hand or hand gestures of the user of mobile client 100 to further optimize the gesture tracking performed by the gesture tracking module 220 to a user.
  • The gesture detection model 221 and the hand detection model 222 may be machine learning models. The models 221 and 222 may use various machine learning techniques such as linear support vector machine (linear SVM), boosting for other algorithms (e.g., AdaBoost), neural networks, logistic regression, naïve Bayes, memory-based learning, random forests, bagged trees, decision trees, boosted trees, boosted stumps, a supervised or unsupervised learning algorithm, or any suitable combination thereof.
  • The gesture detection model 221 classifies gestures within images or videos. The gesture detection model 221 is trained using various images of one or more hand gestures (e.g., by machine learning model training engine 210). In some embodiments, data representing an image or video captured by the camera 102 is input into the gesture detection model 221. The gesture detection model 221 classifies one or more objects within the image or video. For example, the gesture detection model 221 classifies a hand making a five within an image whose data is input into the model. In some embodiments, the gesture detection model 221 detects a gesture performed over time (e.g., a hand wave). For example, consecutively captured images are input into the gesture detection model 221 to determine that the images represent a five at varying positions within the camera view, which the gesture tracking module 220 may then classify as a hand wave.
  • The hand detection model 222 classifies the presence of hands within images or videos. The hand detection model 222 is trained using various images of hands (e.g., by machine learning model training engine 210). Similar to the gesture detection model 221, data representing an image or video captured by the camera 102 may be input into the hand detection model 222. The hand detection model 222 classifies an object within the image or video as a hand or outputs an indication that no hand was detected. In some embodiments, the hand detection model 222 uses clusters of feature points identified from the camera view, where the feature points are associated with a physical object in the environment and the cluster corresponds to a physical surface of the object (e.g., a hand). The hand detection model 222 may use the feature points (e.g., a combination of three-dimensional Cartesian coordinates and the corresponding depths) to identify clusters corresponding to a hand or to what is not a hand.
  • The gesture tracking module 220 may use the output of the gesture detection model 221 or the hand detection model 222 to generate instructions or prompts for display to the user (e.g., via the screen 103 of the mobile client 100). The instructions may include a suggested movement that the user should make or position that the user should be in within the camera view of the mobile client 100 to detect a hand gesture. For example, the gesture tracking module 220 may generate an outline of a hand or hand gesture on the screen 103 to guide a user to form a hand gesture. The camera 102 captures images of the user's hand as it aligns with the displayed guide. The captured images may be input into the gesture detection model 221 or the hand detection model 222. If a gesture or hand is not detected within the images, the gesture tracking module 220 may generate a notification at the mobile client 100 that the detection failed, instructions to guide the user to make a desired hand gesture, or a combination thereof.
  • One or more of the gesture detection model 221 or the hand detection model 222 may output a classification based on a success threshold. The models may determine that the likelihood of a classification (e.g., a confidence score) of a particular gesture or of the presence of hand must meet or exceed a success threshold. For example, the gesture detection model 221 determines that a “five” gesture is being made with 60% likelihood and determines, based on a success threshold of 90%, that there is not a “five” gesture present in the input image or video. In another example, the hand detection model 222 determines that a hand is present with 95% likelihood and determines, based on a success threshold of 80%, that there is indeed a hand present in the input image or video. The success threshold for each model of the gesture tracking module 220 may be different. In some embodiments, a success threshold may be user-specified or adjustable. For example, a user may lower the success threshold used by the models of the gesture tracking module 220 to increase the likelihood that his hand gesture will be identified. In this example, the user may increase the flexible afforded when the user's hand is not placed within the camera view at a sufficiently proper angle or position, and a model may be more likely to determine the user's gesture as one that the user is indeed making.
  • In some embodiments, the gesture detection model 221 and the hand detection model 222 may be one model. For example, the gesture detection model 221 may be used to both detect the presence of a hand within an image and classify the gesture made by the present hand. In some embodiments, use of the hand detection model 222 may improve processing bandwidth and thus, power consumption, of the mobile client 100 by enabling the gesture tracking application 130 (e.g., the gesture tracking module 220) to discard image frames of videos where a hand is not present. Video processing consumes a large amount of a mobile client's processing bandwidth, and by discarding frames that do not have information relevant to the gesture tracking application 130, subsequent processing of image frames are reserved for images that contain valuable information (e.g., a hand gesture being made).
  • The application of both the hand detection model 222 and the gesture detection model 221 may reduce the processing cycles and power consumption of the mobile client 100. In particular, applying the hand detection model 222 to an image before applying the gesture detection model may save processing cycles. In some embodiments, the gesture detection model 221 may be able to classify a large number of gestures (e.g., both still and moving gestures) and thus, requires complex processing with each application to an image in order to determine which of the many gestures is present within the image. By contrast, the hand detection model 222 may be simpler, as detecting a hand by its outline may be simpler than determining the gesture being made. Accordingly, the initial application of the hand detection model 222 that requires less processing bandwidth prior to the subsequent application, if the hand has been detected by the model 222, of the gesture detection model 221 that requires more processing bandwidth allows the mobile client 100 to reserve its processing resources.
  • The AR object state module 230 may hold in memory the current state of the AR object or previous states of the AR objects to determine a subsequent state based on the user's hand gesture and the current or previous states of the AR object. A conditional decision algorithm may be used by the AR object state module 230 to determine a state to instruct the rendering module 240 to render the AR object in. The AR Object State Module 240 may receive custom mappings of states for AR objects and hand gestures from the user. For example, a user may specify that an AR object corresponding to a shield is to be rendered during game play when a user waves his hand and that the rendering module 240 should stop rendering the shield when a user waves his hand again. States of an AR object and the application of the AR object state module 230 in determining those states using detected hand gestures is further described in the descriptions of FIGS. 4, 5A, and 5B.
  • The rendering module 240 provides for display, on the mobile client 100, an augmented reality (AR) object that may be controlled by a user's gestures. The AR engine 120 generates the AR object. When the AR object is at a location on a virtual coordinate space that represents the surfaces within the environment captured by the camera view of the camera 102, the rendering module 240 displays the AR object in a state based on a detected hand gesture. As referred to herein, a “state” of an AR object refers to a position, angle, shape, or any suitable condition of appearance that may change over time. For example, the rendering module 240 may render a ball in a first state where the ball is not aflame or in a second state where the ball is aflame. In another example, the rendering module 240 may render the ball in a first state where the ball is located at a first set of Cartesian coordinates and a first depth in a virtual coordinate space corresponding to the environment or in a second state where the ball is located at a second set of Cartesian coordinates and a second depth in the virtual coordinate space.
  • Processes for Controlling Ar Objects Using Gesture Detection
  • FIG. 3 is a flowchart illustrating a process 300 for controlling an AR object using gesture detection, in accordance with at least one example embodiment. The process 300 may be performed by the gesture tracking application 130. The gesture tracking application 130 may perform operations of the process 300 in parallel or in different orders, or may perform different, additional, or fewer steps. For example, prior to receiving 302 the image, the gesture tracking application 130 may generate for display on the mobile client an instruction to position a user's hand within a certain area of the camera view of the mobile client (e.g., in order to successfully capture the hand within the image).
  • The gesture tracking application 130 receives 302, from a mobile client, an image from a camera view of an environment, where the image depicts a portion of a body of a user. The gesture tracking module 220 may perform the receiving 302. For example, the mobile client 100 provides an image of a user's hand, from the camera view captured by the camera 102, within an environment of the user's living room.
  • The gesture tracking application 130 provides 304 the image to a machine learning model configured to identify a formation of the portion of the body. For example, the gesture tracking module 220 may apply the image data of the received 302 image to the gesture detection model 221. The gesture detection model 221 may identify a formation being made by the hand within the image and classify the formation into one of multiple potential formations (e.g., as specified by a user through labels corresponding to the potential formations that were used to train the gesture detection model 221).
  • The gesture tracking application 130 provides 306 for display on the mobile client, based on an identification of the formation by the machine learning model, an AR object in the camera view of the environment. The rendering module 240 may perform the providing 306. The gesture tracking application 130 may determine a state that the AR object is to be rendered in. For example, the AR object state module 230 uses a conditional decision tree that indicates that, if a particular formation is identified, then the rendering module 240 is to display a corresponding AR object and in a corresponding state.
  • FIG. 4 is a flowchart illustrating a process 400 for controlling the AR object using gesture detection based on the process 300 of FIG. 3, in accordance with at least one example embodiment. The process 400 includes subprocesses of the process 300. Like the process 300, the process 400 may be performed by the gesture tracking application 130. The gesture tracking application 130 may perform operations of the process 400 in parallel or in different orders, or may perform different, additional, or fewer steps. For example, after determining 406 what formation was identified, the gesture tracking application 130 may access a previous formation identified and determine, using both the previous and the current formation, a state in which to generate the AR object.
  • The gesture tracking application 130 receives 402, from a mobile client, an image from a camera view of an environment, the image depicting a hand of a user. The hand captured within the received 402 image is one example of a portion of the body of the user received 302 in the process 300.
  • The gesture tracking application 130 applies 404 a machine learning model to the image, the machine learning model trained on training image data representative of hand formations, the machine learning model configured to identify a formation of the hand in the image as one of the hand formations. The machine learning model applied 404 is one example of a machine learning model applied 304 in the process 300. The machine learning model may be a convolutional neural network trained (e.g., by the machine learning model training engine 210) using various images of hands making particular formations (e.g., fists and fives). In one example, if the image received 402 depicts the hand making a fist, the machine learning model may output a classification that the identified formation is a first (e.g., palm covered over by fingers curled in).
  • The gesture tracking application 130 determines 406 whether a first or a second hand formation is identified by the machine learning model. The gesture tracking application may make this determination using a mapping between formations and AR object states, as described in the description of the AR object state module 230. If a first hand formation is identified, the gesture tracking application 130 provides 408, for display on the mobile client, the AR object in a first state from the AR engine in the camera view of the environment. If a second hand formation is identified, the gesture tracking application 130 provides 410, for display on the mobile client, the AR object in a second state from the AR engine in the camera view of the environment. To provide the AR object for display in a particular state, the gesture tracking application may use the rendering module 240 which receives instructions to render a particular AR object in a particular state from the AR object state module 230 and transmits instructions to the AR engine 120 to render the AR object accordingly at the screen 103 of the mobile client 100. The determination 406 and either provided 408 or 410 AR object for display may be one example of the provided 306 AR object for display of the process 300.
  • Example AR Application with Gesture Tracking to Control AR Objects
  • FIGS. 5A and 5B illustrate user interactions with an AR application that integrates gesture tracking to control AR objects, in accordance with at least one embodiment. FIG. 5A shows a first user interaction 500 a where a user's hand 510 a is in a first state and is captured, by the mobile client 100, within an image (e.g., camera-captured hand 510 b). The gesture tracking application 130 renders an AR object 520 (e.g., a ball) for display on the mobile client 100 such that the AR object 520 appears integrated into the environment with the user's hand 510 a captured within the camera view.
  • In the first user interaction 500 a, the AR object 520 is rendered for display (e.g., by the rendering module 240) in a first state where it is at a position overlaying the user's hand. The gesture tracking application 130 may determine a position within a virtual coordinate space that the user's hand 510 a is located and determine a corresponding location to render the AR object 520 (e.g., a set of coordinates causing the AR object 520 to appear above the user's hand 510 a).
  • The gesture tracking application 130 provides a gesture indicator 530 for display that indicates the gesture identified by the gesture tracking application (e.g., by the gesture detection model 221). In the first user interaction 500 a, the gesture indicator 530 indicates that the user's hand 510 a is making a five. The gesture tracking application 130 may highlight, circle, or otherwise visually distinguish a gesture indicator from other indicators to inform the user of presently detected gestures within the camera view.
  • FIG. 5B shows a second user interaction 500 b where the user's hand 510 a is in a second state and is captured, by the mobile client 100, within an image (e.g., camera-captured hand 510 b). The gesture tracking application 130 provides a gesture indicator 540 for display that indicates the gesture identified by the gesture detection model 221 is a first during the second user interaction 500 b. The gesture tracking application 130 renders the AR object 520 for display in a second state where the object appears to move upward within the camera view, as indicated by the state change indicator 550 included in FIG. 5B for clarity and not necessarily rendered by the gesture tracking application 130 for display to the user. In some embodiments, the gesture tracking application 130 may use a previous state and the present gesture of a user's hand to determine a subsequent state in which to render an AR object (e.g., using a mapping table or conditional decision tree). For example, the gesture tracking application 130 determines that a combination of the identified first and the existing position of the AR object 520 over the user's hand indicates that the next state in which the AR object is to be rendered is appearing to move upward.
  • Computing Machine Architecture
  • FIG. 6 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). Specifically, FIG. 6 shows a diagrammatic representation of a machine in the example form of a computer system 600 within which program code (e.g., software) for causing the machine to perform any one or more of the methodologies discussed herein may be executed. The program code may correspond to functional configuration of the modules and/or processes described with FIGS. 1-5B. The program code may be comprised of instructions 624 executable by one or more processors 602. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • The machine may be a portable computing device or machine (e.g., smartphone, tablet, wearable device (e.g., smartwatch)) capable of executing instructions 624 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 624 to perform any one or more of the methodologies discussed herein.
  • The example computer system 600 includes a processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 604, and a static memory 606, which are configured to communicate with each other via a bus 608. The computer system 600 may further include visual display interface 610. The visual interface may include a software driver that enables displaying user interfaces on a screen (or display). The visual interface may display user interfaces directly (e.g., on the screen) or indirectly on a surface, window, or the like (e.g., via a visual projection unit). For ease of discussion the visual interface may be described as a screen. The visual interface 610 may include or may interface with a touch enabled screen. The computer system 600 may also include alphanumeric input device 612 (e.g., a keyboard or touch screen keyboard), a cursor control device 614 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 616, a signal generation device 618 (e.g., a speaker), and a network interface device 620, which also are configured to communicate via the bus 608.
  • The storage unit 616 includes a machine-readable medium 622 on which is stored instructions 624 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 624 (e.g., software) may also reside, completely or at least partially, within the main memory 604 or within the processor 602 (e.g., within a processor's cache memory) during execution thereof by the computer system 600, the main memory 604 and the processor 602 also constituting machine-readable media. The instructions 624 (e.g., software) may be transmitted or received over a network 626 via the network interface device 620.
  • While machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 624). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 624) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
  • Additional Configuration Considerations
  • While using an AR application, a user may desire a more seamless experience by controlling AR objects with his hands (e.g., using hand gestures), as the user is accustomed to doing in reality. While tracking hand gestures for conventional AR systems achieve the user's desire with dedicated, handheld controllers, a wired power source, and enough hardware real-estate to accommodate for powerful, power-hungry processors, mobile clients do not share those similar specifications to afford gesture tracking in that conventional manner. Rather, a mobile device is limited in its power and processing resources. The embodiments herein optimize for a mobile device's power and processing constraints by limiting image frames processed during the gesture detection (e.g., discarding image frames that a machine learning model determines does not depict a hand). Thus, the methods described herein enable gesture tracking for AR object control on mobile client rendered AR systems without consuming excessive amounts of processing power and present an immersive AR experience to the user.
  • Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
  • Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
  • In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
  • Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
  • Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
  • The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
  • Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
  • The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
  • The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
  • Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
  • Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
  • As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
  • Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
  • As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
  • In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
  • Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for gesture tracking in an augmented reality environment executed on a mobile client through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.

Claims (20)

What is claimed is:
1. A non-transitory computer readable storage medium comprising stored instructions, the instructions when executed by a processor cause the processor to:
receive from a mobile client an image from a camera view of an environment, the image depicting a hand of a user;
apply a machine learning model to the image, the machine learning model trained on training image data representative of a plurality of hand formations, the machine learning model configured to identify a formation of the hand in the image as one of the plurality of hand formations;
provide for display on the mobile client, responsive to identification of the formation of the hand as a first hand formation of the plurality of hand formations, an augmented reality (AR) object in a first state from an AR engine in the camera view of the environment; and
provide for display on the mobile client, responsive to the formation of the hand identified as a second hand formation of the plurality of hand formations, the AR object in a second state from the AR engine in the camera view of the environment.
2. The non-transitory computer readable storage medium of claim 1, wherein the machine learning model is a first machine learning model, and wherein the instructions further comprise instructions that when executed by the processor cause the processor to identify a location of the hand, the instructions to identify location of the hand further comprising instructions that when executed by the processor cause the processor to:
apply a second machine learning model to the received image, the second machine learning model configured to classify real-world objects in the environment, the real-world objects including the hand;
receive a plurality of feature points associated with the real-world objects in the environment;
generate a three-dimensional (3D) virtual coordinate space based the plurality of feature points; and
identify, based on a classification of the hand by the second machine learning model, the location of the hand associated with corresponding coordinates in the generated 3D virtual coordinate space.
3. The non-transitory computer readable storage medium of claim 2, wherein the AR engine rendered object is provided for display based on the identified location.
4. The non-transitory computer readable storage medium of claim 2, wherein the instructions further comprise instructions that when executed by the processor cause the processor to remove data of a given image from memory responsive to the second machine learning model classifying an absence of any hand depicted within the given image.
5. The non-transitory computer readable storage medium of claim 1, wherein the instructions further comprise instructions that when executed by the processor cause the processor to train the machine learning model using the training image data, the instruction to train the machine learning model further comprising instructions that when executed by the processor cause the processor to:
receive a plurality of images of the plurality of hand formations; and
apply a respective label to each of the plurality of images of the plurality of hand formations, the training image data comprising the labeled plurality of images.
6. The non-transitory computer readable storage medium of claim 5, wherein each respective label corresponds to a computer executable command.
7. The non-transitory computer readable storage medium of claim 5, wherein the plurality of hand formations includes a user-customized hand formation.
8. The non-transitory computer readable storage medium of claim 7, wherein the instructions further comprise instructions that when executed by the processor cause the processor to prompt the user to provide a user-specified state of the AR engine rendered object, wherein an identification of the user-customized hand formation by the machine learning model indicates that the AR engine rendered object is to be provided for display in the user-specified state.
9. The non-transitory computer readable storage medium of claim 1, wherein the machine learning model is further configured to output a confidence score associated with the identified formation of the hand, and wherein providing for display the AR engine rendered object in the first state or in the second state is further responsive to the confidence score exceeding a threshold confidence score.
10. The non-transitory computer readable storage medium of claim 9, wherein the instructions further comprise instructions that when executed by the processor cause the processor to receive a selection of a user-specified threshold confidence score.
11. The non-transitory computer readable storage medium of claim 1, wherein the AR engine is a game engine.
12. A computer system comprising:
a gesture tracking module configured to:
receive from a mobile client an image from a camera view of an environment, the image depicting a hand of a user; and
apply a machine learning model to the image, the machine learning model trained on training image data representative of a plurality of hand formations, the machine learning model configured to identify a formation of the hand in the image as one of the plurality of hand formations; and
a rendering module of the gesture tracking application configured to:
provide for display on the mobile client, responsive to identification of the formation of the hand as a first hand formation of the plurality of hand formations, an augmented reality (AR) object in a first state from an AR engine in the camera view of the environment; and
provide for display on the mobile client, responsive to the formation of the hand identified as a second hand formation of the plurality of hand formations, the AR object in a second state from the AR engine in the camera view of the environment.
13. The system of claim 12, wherein the machine learning model is a first machine learning model, wherein the gesture tracking module is further configured to identify a location of the hand within the image by being further configured to:
apply a second machine learning model to the received image, the second machine learning model configured to classify real-world objects in the environment, the real-world objects including the hand;
receive a plurality of feature points associated with the real-world objects in the environment;
generate a three-dimensional (3D) virtual coordinate space based the plurality of feature points; and
identify, based on a classification of the hand by the second machine learning model, the location of the hand associated with corresponding coordinates in the generated 3D virtual coordinate space.
14. The system of claim 13, wherein the AR engine rendered object is provided for display based on the identified location.
15. The system of claim 13, wherein the gesture tracking module is further configured to remove data of a given image from memory responsive to classification of an absence of any hand depicted within the given image by the second machine learning model.
16. A computer-implemented method comprising:
receiving from a mobile client an image from a camera view of an environment, the image depicting a hand of a user;
applying a machine learning model to the image, the machine learning model trained on training image data representative of a plurality of hand formations, the machine learning model configured to identify a formation of the hand in the image as one of the plurality of hand formations;
providing for display on the mobile client, responsive to identification of the formation of the hand as a first hand formation of the plurality of hand formations, an augmented reality (AR) object in a first state from an AR engine in the camera view of the environment; and
providing for display on the mobile client, responsive to the formation of the hand identified as a second hand formation of the plurality of hand formations, the AR object in a second state from the AR engine in the camera view of the environment.
17. The computer-implemented method of claim 16, wherein the machine learning model is a first machine learning model, further comprising identifying a location of the hand within the image by:
applying a second machine learning model to the received image, the second machine learning model configured to classify real-world objects in the environment, the real-world objects including the hand;
receiving a plurality of feature points associated with the real-world objects in the environment;
generating a three-dimensional (3D) virtual coordinate space based the plurality of feature points; and
identifying, based on a classification of the hand by the second machine learning model, the location of the hand associated with corresponding coordinates in the generated 3D virtual coordinate space.
18. The computer-implemented method of claim 16, wherein the AR engine rendered object is provided for display based on the identified location.
19. The computer-implemented method of claim 16, further comprising removing data of a given image from memory responsive to the second machine learning model classifying an absence of any hand depicted within the given image.
20. A computer-implemented method comprising:
receiving from a mobile client an image from a camera view of an environment, the image depicting a portion of a body of a user;
providing the image to a machine learning model configured to identify a formation of the portion of the body in the image; and
providing for display on the mobile client, based on an identification of the formation by the machine learning model, an augmented reality (AR) object in the camera view of the environment.
US17/170,255 2020-02-07 2021-02-08 Gesture tracking for mobile rendered augmented reality Abandoned US20210247846A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/170,255 US20210247846A1 (en) 2020-02-07 2021-02-08 Gesture tracking for mobile rendered augmented reality

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202062971766P 2020-02-07 2020-02-07
US17/170,255 US20210247846A1 (en) 2020-02-07 2021-02-08 Gesture tracking for mobile rendered augmented reality

Publications (1)

Publication Number Publication Date
US20210247846A1 true US20210247846A1 (en) 2021-08-12

Family

ID=77177475

Family Applications (3)

Application Number Title Priority Date Filing Date
US17/170,431 Abandoned US20210248826A1 (en) 2020-02-07 2021-02-08 Surface distinction for mobile rendered augmented reality
US17/170,255 Abandoned US20210247846A1 (en) 2020-02-07 2021-02-08 Gesture tracking for mobile rendered augmented reality
US17/170,629 Active US11393176B2 (en) 2020-02-07 2021-02-08 Video tools for mobile rendered augmented reality game

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US17/170,431 Abandoned US20210248826A1 (en) 2020-02-07 2021-02-08 Surface distinction for mobile rendered augmented reality

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/170,629 Active US11393176B2 (en) 2020-02-07 2021-02-08 Video tools for mobile rendered augmented reality game

Country Status (1)

Country Link
US (3) US20210248826A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11295503B1 (en) 2021-06-28 2022-04-05 Facebook Technologies, Llc Interactive avatars in artificial reality
US11475639B2 (en) * 2020-01-03 2022-10-18 Meta Platforms Technologies, Llc Self presence in artificial reality
US20220374130A1 (en) * 2021-04-21 2022-11-24 Facebook, Inc. Dynamic Content Rendering Based on Context for AR and Assistant Systems
US20230056020A1 (en) * 2021-08-19 2023-02-23 Meta Platforms Technologies, Llc Systems and methods for communicating model uncertainty to users
WO2023133623A1 (en) * 2022-01-14 2023-07-20 Shopify Inc. Systems and methods for generating customized augmented reality video
US20230244315A1 (en) * 2022-01-28 2023-08-03 Plantronics, Inc. Customizable gesture commands
US11886767B2 (en) 2022-06-17 2024-01-30 T-Mobile Usa, Inc. Enable interaction between a user and an agent of a 5G wireless telecommunication network using augmented reality glasses
WO2024050620A1 (en) * 2022-09-09 2024-03-14 Shopify Inc. Methods for calibrating augmented reality scenes
WO2024101643A1 (en) * 2022-11-07 2024-05-16 삼성전자 주식회사 Augmented reality device which recognizes object using artificial intelligence model of external device, and operation method therefor

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020129115A1 (en) * 2018-12-17 2020-06-25 株式会社ソニー・インタラクティブエンタテインメント Information processing system, information processing method and computer program
US11488116B2 (en) * 2020-05-21 2022-11-01 HUDDL Inc. Dynamically generated news feed
US11944905B2 (en) * 2021-09-28 2024-04-02 Sony Group Corporation Method to regulate jumps and falls by playable characters in XR spaces
US11617949B1 (en) * 2021-09-28 2023-04-04 Sony Group Corporation Methods for predefining virtual staircases connecting platforms in extended reality (XR) environments
US11759711B2 (en) 2021-09-28 2023-09-19 Sony Group Corporation Method for quasi-random placement of virtual items in an extended reality (XR) space
US11684848B2 (en) * 2021-09-28 2023-06-27 Sony Group Corporation Method to improve user understanding of XR spaces based in part on mesh analysis of physical surfaces
GB2612767A (en) * 2021-11-03 2023-05-17 Sony Interactive Entertainment Inc Virtual reality interactions

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10521671B2 (en) * 2014-02-28 2019-12-31 Second Spectrum, Inc. Methods and systems of spatiotemporal pattern recognition for video content development
US11861906B2 (en) * 2014-02-28 2024-01-02 Genius Sports Ss, Llc Data processing systems and methods for enhanced augmentation of interactive video content
US10695680B2 (en) * 2016-12-09 2020-06-30 Unity IPR ApS Systems and methods for creating, broadcasting, and viewing 3D content
US11113885B1 (en) * 2017-09-13 2021-09-07 Lucasfilm Entertainment Company Ltd. Real-time views of mixed-reality environments responsive to motion-capture data
JP2022512600A (en) * 2018-10-05 2022-02-07 マジック リープ, インコーポレイテッド Rendering location-specific virtual content anywhere
US11030812B2 (en) * 2019-01-02 2021-06-08 The Boeing Company Augmented reality system using enhanced models
EP3895416A4 (en) * 2019-03-27 2022-03-02 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Three-dimensional localization using light-depth images
US11276216B2 (en) * 2019-03-27 2022-03-15 Electronic Arts Inc. Virtual animal character generation from image or video data
US10983662B2 (en) * 2019-04-01 2021-04-20 Wormhole Labs, Inc. Distally shared, augmented reality space
CN111815755B (en) * 2019-04-12 2023-06-30 Oppo广东移动通信有限公司 Method and device for determining blocked area of virtual object and terminal equipment
US11151792B2 (en) * 2019-04-26 2021-10-19 Google Llc System and method for creating persistent mappings in augmented reality
US20200364937A1 (en) * 2019-05-16 2020-11-19 Subvrsive, Inc. System-adaptive augmented reality
US11887253B2 (en) * 2019-07-24 2024-01-30 Electronic Arts Inc. Terrain generation and population system
WO2021087065A1 (en) * 2019-10-31 2021-05-06 Magic Leap, Inc. Cross reality system with quality information about persistent coordinate frames
JP2023504775A (en) * 2019-11-12 2023-02-07 マジック リープ, インコーポレイテッド Cross-reality system with localization services and shared location-based content
EP4103910A4 (en) * 2020-02-13 2024-03-06 Magic Leap Inc Cross reality system with accurate shared maps
US11830149B2 (en) * 2020-02-13 2023-11-28 Magic Leap, Inc. Cross reality system with prioritization of geolocation information for localization
US20210256766A1 (en) * 2020-02-13 2021-08-19 Magic Leap, Inc. Cross reality system for large scale environments
EP4104001A4 (en) * 2020-02-13 2024-03-13 Magic Leap Inc Cross reality system with map processing using multi-resolution frame descriptors
US20210275908A1 (en) * 2020-03-05 2021-09-09 Advanced Micro Devices, Inc. Adapting encoder resource allocation based on scene engagement information
US11494875B2 (en) * 2020-03-25 2022-11-08 Nintendo Co., Ltd. Systems and methods for machine learned image conversion
WO2021222371A1 (en) * 2020-04-29 2021-11-04 Magic Leap, Inc. Cross reality system for large scale environments

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475639B2 (en) * 2020-01-03 2022-10-18 Meta Platforms Technologies, Llc Self presence in artificial reality
US11861757B2 (en) 2020-01-03 2024-01-02 Meta Platforms Technologies, Llc Self presence in artificial reality
US11861315B2 (en) 2021-04-21 2024-01-02 Meta Platforms, Inc. Continuous learning for natural-language understanding models for assistant systems
US20220374130A1 (en) * 2021-04-21 2022-11-24 Facebook, Inc. Dynamic Content Rendering Based on Context for AR and Assistant Systems
US11966701B2 (en) * 2021-04-21 2024-04-23 Meta Platforms, Inc. Dynamic content rendering based on context for AR and assistant systems
US11893674B2 (en) 2021-06-28 2024-02-06 Meta Platforms Technologies, Llc Interactive avatars in artificial reality
US11295503B1 (en) 2021-06-28 2022-04-05 Facebook Technologies, Llc Interactive avatars in artificial reality
US11789544B2 (en) * 2021-08-19 2023-10-17 Meta Platforms Technologies, Llc Systems and methods for communicating recognition-model uncertainty to users
US20230056020A1 (en) * 2021-08-19 2023-02-23 Meta Platforms Technologies, Llc Systems and methods for communicating model uncertainty to users
WO2023133623A1 (en) * 2022-01-14 2023-07-20 Shopify Inc. Systems and methods for generating customized augmented reality video
US20230244315A1 (en) * 2022-01-28 2023-08-03 Plantronics, Inc. Customizable gesture commands
US11899846B2 (en) * 2022-01-28 2024-02-13 Hewlett-Packard Development Company, L.P. Customizable gesture commands
US11886767B2 (en) 2022-06-17 2024-01-30 T-Mobile Usa, Inc. Enable interaction between a user and an agent of a 5G wireless telecommunication network using augmented reality glasses
WO2024050620A1 (en) * 2022-09-09 2024-03-14 Shopify Inc. Methods for calibrating augmented reality scenes
WO2024101643A1 (en) * 2022-11-07 2024-05-16 삼성전자 주식회사 Augmented reality device which recognizes object using artificial intelligence model of external device, and operation method therefor

Also Published As

Publication number Publication date
US20210245043A1 (en) 2021-08-12
US20210248826A1 (en) 2021-08-12
US11393176B2 (en) 2022-07-19

Similar Documents

Publication Publication Date Title
US20210247846A1 (en) Gesture tracking for mobile rendered augmented reality
US10832039B2 (en) Facial expression detection method, device and system, facial expression driving method, device and system, and storage medium
US11526713B2 (en) Embedding human labeler influences in machine learning interfaces in computing environments
US10924676B2 (en) Real-time visual effects for a live camera view
KR102222642B1 (en) Neural network for object detection in images
US10572072B2 (en) Depth-based touch detection
US11715224B2 (en) Three-dimensional object reconstruction method and apparatus
US20180088663A1 (en) Method and system for gesture-based interactions
JP2016503220A (en) Parts and state detection for gesture recognition
EP2972950B1 (en) Segmentation of content delivery
KR102636243B1 (en) Method for processing image and electronic device thereof
US11816876B2 (en) Detection of moment of perception
EP3757817A1 (en) Electronic device and control method therefor
US20220222470A1 (en) Automatic content recognition and information in live streaming suitable for video games
US11640700B2 (en) Methods and systems for rendering virtual objects in user-defined spatial boundary in extended reality environment
US11747954B1 (en) Systems and methods for organizing contents in XR environments
CN114510142B (en) Gesture recognition method based on two-dimensional image, gesture recognition system based on two-dimensional image and electronic equipment
JP2021114313A (en) Face composite image detecting method, face composite image detector, electronic apparatus, storage medium and computer program
KR20220021581A (en) Robot and control method thereof
US20220334674A1 (en) Information processing apparatus, information processing method, and program
US11640695B2 (en) Digital garment generation
KR102431386B1 (en) Method and system for interaction holographic display based on hand gesture recognition
US20230154126A1 (en) Creating a virtual object response to a user input
KR20240062249A (en) Electronic apparatus and method of acquiring touch coordinates thereof
KR20230170485A (en) An electronic device for obtaining image data regarding hand gesture and a method for operating the same

Legal Events

Date Code Title Description
AS Assignment

Owner name: KRIKEY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHRIRAM, KETAKI LALITHA UTHRA;SHRIRAM, JHANVI SAMYUKTA LAKSHMI;OLOKOBA, YUSUF OLANREWAJU;SIGNING DATES FROM 20210204 TO 20210205;REEL/FRAME:055221/0504

STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION