WO2021088498A1 - 虚拟物体显示方法以及电子设备 - Google Patents

虚拟物体显示方法以及电子设备 Download PDF

Info

Publication number
WO2021088498A1
WO2021088498A1 PCT/CN2020/113340 CN2020113340W WO2021088498A1 WO 2021088498 A1 WO2021088498 A1 WO 2021088498A1 CN 2020113340 W CN2020113340 W CN 2020113340W WO 2021088498 A1 WO2021088498 A1 WO 2021088498A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
map
coordinate system
slam
pose
Prior art date
Application number
PCT/CN2020/113340
Other languages
English (en)
French (fr)
Inventor
姜军
林涛
苏琪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911092326.0A external-priority patent/CN112785715B/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20885155.0A priority Critical patent/EP4030391A4/en
Publication of WO2021088498A1 publication Critical patent/WO2021088498A1/zh
Priority to US17/718,734 priority patent/US11776151B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3626Details of the output of route guidance instructions
    • G01C21/3647Guidance involving output of stored or live camera images or video streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B29/00Maps; Plans; Charts; Diagrams, e.g. route diagram
    • G09B29/10Map spot or coordinate position indicators; Map reading aids
    • G09B29/106Map spot or coordinate position indicators; Map reading aids using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/004Annotating, labelling

Definitions

  • This application relates to the technical field of virtual scenes, in particular to virtual object display methods and electronic devices.
  • VR Virtual Reality
  • AR Augmented Reality
  • MR Mixed Reality
  • VR technology is a simulation technology that can create and experience a virtual world
  • AR technology is a technology that can superimpose and interact between virtual reality and the real world
  • MR technology is a new visualization environment generated by merging the real world and virtual world, and a comprehensive technology that builds an interactive feedback information loop between the real world, virtual world and users.
  • SLAM Simultaneous Localization and Mapping
  • SLAM technology can specifically realize that when electronic devices (such as mobile phones, VR glasses and other mobile electronic devices) start to move from an unknown location in the environment, they can locate themselves according to location estimates and maps during the movement, and at the same time build on the basis of their own positioning. Incremental map to facilitate subsequent positioning.
  • a system or module using SLAM technology can also be called a spatial positioning engine.
  • the virtual objects of the VR or AR application in the global coordinate system can be displayed on the electronic device.
  • the position and posture of the electronic device in the local coordinate system will drift, causing the display position and direction of the virtual object on the electronic device to also shift, which brings inconvenience to the user.
  • the embodiments of the present application provide a virtual object display method and an electronic device, which can solve the problem of the pose drift of the electronic device to a certain extent, and improve the user experience.
  • the embodiments of the present application provide a virtual object display method, which can be applied to electronic devices with display components (such as display screens (for example, touch screens, flexible screens, curved screens, etc., or optical components)) and cameras ,
  • the electronic device can be a handheld terminal (such as a mobile phone), VR or AR glasses, a drone, an unmanned vehicle, etc.
  • the method includes: detecting the user's operation to open the application; in response to the operation, downloading the global sub-map and storing it in In the instant positioning and mapping (SLAM) system of the electronic device; the global sub-map is a sub-map corresponding to the location of the electronic device in the global map; the location of the virtual object is displayed on the display component And posture, the position and posture of the virtual object are calculated by the SLAM system at least according to the video image collected by the camera and the global sub-map.
  • SLAM instant positioning and mapping
  • a BA (Bundle Adjustment) method may be used to calculate the pose.
  • the posture of the virtual object may refer to the orientation of the virtual object, for example.
  • the so-called “operation to open the application” can be to open the application by clicking, touching, sliding, or shaking, or by voice control or other means to open the application, which is not limited in this application.
  • the navigation function in the application is activated, the camera is activated, and so on.
  • the electronic device may also initiate the step of downloading the global sub-map in other ways, for example, by detecting changes in ambient light to initiate the step of downloading the global sub-map.
  • the server can be used as a platform for providing content and information support to VR or AR or MR applications of electronic devices.
  • a global map is stored in the server.
  • the global map is a high-precision map with a large geographic area.
  • the so-called "large geographic area” is relative to the geographic area represented by the SLAM map in the electronic device.
  • the global map may be obtained by integrating multiple SLAM maps generated by one or more electronic devices according to certain rules.
  • the global sub-map represents the sub-map corresponding to the location of the electronic device in the global map, that is, the actual location of the electronic device can be taken as the starting point to obtain the information in the preset area around the starting point.
  • the map content serves as the global sub-map.
  • the electronic device can be installed with virtual scene applications such as VR or AR or MR applications, and can run the VR or AR or MR applications based on user operations (such as clicking, touching, sliding, shaking, voice control, etc.).
  • the electronic device can collect the video image in the environment through the local camera, combine the collected video image and the downloaded global sub-map to determine the current pose of the electronic device, and then display the position of the virtual object on the display component based on the current pose of the electronic device And gesture.
  • the virtual object may correspondingly be a virtual object in a VR scene, an AR scene, or an MR scene (that is, an object in a virtual environment).
  • the SLAM system continuously creates SLAM maps during the continuous movement of electronic devices, and generates the final position of the electronic device based on the pose estimated by the SLAM map and the pose estimated by the data collected by its own sensors. posture.
  • noise will continue to be introduced, that is, the pose estimated based on the SLAM map will accumulate errors, and the data collected by the sensor will also have noise, and the pose estimated by the data collected by its own sensor will also accumulate errors, leading to Pose drift phenomenon.
  • the electronic device after the electronic device detects the user's operation to open the application, it requests the server to download the global sub-map, and uses the global sub-map with higher accuracy than the SLAM map as the visual observation input to the SLAM system.
  • the SLAM system uses the global sub-map. Maps and collected video images can be used to estimate the pose of electronic devices, which can effectively reduce or even eliminate the pose drift that occurs in the long-term pose estimation of the SLAM system, thereby ensuring a long time (for example, more than 1 minute) in VR, AR, or MR applications.
  • the display position and direction of the virtual object on the electronic device will not shift, and the virtual object can be displayed correctly for a long time (for example, the virtual object is displayed in a relatively accurate manner under the environment and duration represented by the video image), thereby Improve the user experience.
  • the position of the electronic device in the global sub-map can also be called the global pose, and correspondingly, the position of the electronic device in the SLAM map constructed by itself can also be called the local pose.
  • the pose data of the electronic device is used to characterize the position and pose of the virtual object, and the pose data of the electronic device is determined by the SLAM system at least according to the The video image collected by the camera and the global sub-map are obtained by performing pose calculation at a first frequency.
  • the first frequency represents the frequency at which the SLAM system in the electronic device performs global pose estimation, that is, the frequency at which the global sub-map is called.
  • the value range of the first frequency may be 10 to 30 Hz, that is, 10 to 30 frames per second, that is, the frequency of calling the global submap may be a value in 10 to 30 frames.
  • the first frequency may be less than or equal to the frequency at which the display component (e.g., display panel) displays the video stream in value.
  • the SLAM system of the electronic device can call the global sub-map at a relatively high fixed frequency, so as to realize the posture tracking or posture update of the electronic device.
  • the global pose (ie pose data) of the electronic device is obtained in real time
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the display component according to the global pose of the electronic device.
  • the position and posture of the virtual object will not jump. This is because, on the one hand, during this period of time, the SLAM system in the electronic device accesses the stored global sub-map to calculate the global pose of the electronic device, which can overcome the inaccuracy of the pre-built SLAM map in the electronic device.
  • this embodiment can display the virtual object for a long time and the virtual object will not be misaligned in the screen, eliminating the phenomenon of jumping of the virtual object caused by the sudden change of the pose, and further improving the user experience.
  • the SLAM map may include, for example, the following map content: multiple key frames, triangulation feature points, and the association between the key frames and the feature points.
  • the key frame may be formed based on the image collected by the camera and the camera parameters used to generate the image (for example, the pose of the electronic device in the SLAM coordinate system).
  • the feature points may represent different 3D map points along the three-dimensional space in the SLAM map and the feature descriptions on the 3D map points. Each feature point can have an associated feature location. Each feature point can represent a 3D coordinate position and is associated with one or more descriptors.
  • Feature points can also be called 3D features, feature points, 3D feature points or other suitable names.
  • 3D map points represent the coordinates on the three-dimensional space axes X, Y, and Z.
  • the 3D map points represent the coordinates on the three-dimensional space axes X, Y, and Z of the local coordinate system. Coordinates on Y and Z.
  • the 3D map points represent the coordinates on the three-dimensional axis X, Y, and Z of the global coordinate system.
  • the process of the pose calculation includes: performing pose calculation based on the video image collected by the camera, the global submap, and the motion data collected by the electronic device , To obtain the pose data of the electronic device, and the movement data includes movement speed data and movement direction data.
  • the motion data collected by the electronic device may be, for example, motion data collected by an inertial measurement unit (IMU) in the electronic device, and the IMU can collect information such as angular velocity and linear acceleration of the electronic device at a high frequency. , Angular acceleration, linear acceleration, etc. are integrated to estimate the pose of the electronic device.
  • IMU inertial measurement unit
  • the electronic device can use the SLAM algorithm to call the 3D features (3D map points) of the global submap at a high frequency based on the collected video images and the high frequency of the IMU, and further improve the accuracy of the estimated global pose by introducing the IMU to ensure
  • the 3D features (3D map points) of the global sub-map are effectively used as measurement values in the SLAM algorithm, and the pose estimation is performed with high precision to avoid pose drift and jitter.
  • the downloading of the global submap in response to the operation includes: in response to the operation, sending indication information of the initial position of the electronic device to the server;
  • the server receives the global sub-map, and the global sub-map is determined according to the initial position of the electronic device.
  • the embodiment of the application requests the download of the global sub-map by uploading the indication information of the initial position of the electronic device, without uploading a video image, so that the global sub-map related to the initial position of the electronic device can be obtained, which is also beneficial Saving bandwidth resources and reducing the processing burden of the server is also conducive to reducing or eliminating the risk of privacy leakage.
  • the indication information of the initial location of the electronic device includes first location fingerprint information for indicating the initial location of the electronic device; the global sub-map corresponds to the first location fingerprint information; Two location fingerprint information, and the first location fingerprint information matches the second location fingerprint information.
  • the initial location indicated by the location fingerprint information may be the geographic location information of the electronic device when the electronic device requests to download the map.
  • the source of the location fingerprint information includes GNSS/WiFi/Bluetooth/base station and other positioning methods.
  • the location fingerprint information uploaded on the server is matched with the location fingerprint information of the global sub-map, and useful global sub-maps can be downloaded to the side of the electronic device, which improves the matching efficiency and accuracy, thereby helping to reduce the number of maps downloaded. Time delay.
  • the method further includes: the electronic device updates the SLAM map of the SLAM system according to the posture data of the electronic device.
  • the electronic device can convert the electronic
  • the global pose of the device is fed back to the SLAM map in the global coordinate system, and the current image frame (key frame) is merged into the SLAM map in the global coordinate system based on the global pose, so as to realize the extension/extension of the SLAM map, and after the update
  • the SLAM map is more accurate than the traditional SLAM map.
  • the method before the updating the SLAM map of the SLAM system according to the pose data of the electronic device, the method further includes: according to the Kth of the video images collected by the camera. Frame image and the SLAM map in the first coordinate system, determine the first pose data of the electronic device in the SLAM map in the first coordinate system; K is an integer greater than or equal to 1, according to the Kth frame Image and the global sub-map in the second coordinate system to determine the second pose data of the electronic device in the global sub-map in the second coordinate system; according to the first pose data and the second pose data Pose data to obtain coordinate system transformation information between the first coordinate system of the SLAM map and the second coordinate system of the global map; according to the coordinate system transformation information, the SLAM in the first coordinate system The map is transformed into the SLAM map in the second coordinate system; accordingly, the updating the SLAM map of the SLAM system according to the pose data of the electronic device includes: taking the pose data of the electronic device as the current The pose data
  • the K-th frame image refers to a certain frame in the video image sequence collected by the camera. It should be understood that, that is, the video image collected by the camera may be a video sequence (video stream), which may include multiple frames of images, and the K-th frame image may be a certain frame in the video stream.
  • the coordinate system used to construct the SLAM map can be called the first coordinate system.
  • the first coordinate system in this article may also be called the local coordinate system, the SLAM coordinate system, the camera coordinate system or others in some application scenarios. Some suitable term.
  • the pose embodied by the electronic device in the local coordinate system can be called a local pose.
  • the coordinate system used to construct the global map may be called the second coordinate system.
  • the second coordinate system may also be called the global coordinate system, the world coordinate system, or some other appropriate terminology.
  • the posture embodied by the electronic device in the global coordinate system can be referred to as the global posture.
  • the pose data of the electronic device is the pose data of the electronic device in a first coordinate system, or the pose data of the electronic device in a second coordinate system;
  • the second coordinate system is the coordinate system of the global sub-map.
  • the embodiment of the application obtains the pose of the terminal in the local coordinate system and the pose in the global coordinate system respectively according to the same frame. Based on these two poses, the coordinate system transformation information between the two coordinate systems can be obtained (for example, Coordinate system transformation matrix). According to this coordinate system transformation matrix, the two coordinate systems can be synchronized. In this way, the information originally represented by the local coordinate system (such as local pose, image feature points, SLAM map 3D map points Etc.) Then it can be transformed to the global coordinate system based on the coordinate system transformation matrix. In this way, the pose and 3D map points in the SLAM system are expressed in the same coordinate system as the 3D map points in the global submap.
  • the coordinate system transformation information between the two coordinate systems can be obtained (for example, Coordinate system transformation matrix).
  • the two coordinate systems can be synchronized.
  • the information originally represented by the local coordinate system such as local pose, image feature points, SLAM map 3D map points Etc.
  • the pose and 3D map points in the SLAM system are
  • the 3D map points in the global sub-map can be used as the measurement value input of the SLAM system to realize the tight coupling between the global sub-map and the SLAM system, and then track the global pose of the electronic device in real time through pose estimation, which will effectively eliminate The drift of SLAM pose tracking.
  • the global pose of the electronic device can be used as the pose data in the SLAM map in the global coordinate system to update the SLAM map in the second coordinate system.
  • the electronic device is in the first coordinate system according to the K-th frame image in the video image collected by the camera and the SLAM map in the first coordinate system.
  • the first pose data in the SLAM map in the coordinate system includes: obtaining the position of the electronic device according to the K-th frame image, the SLAM map in the first coordinate system, and the motion data collected by the electronic device.
  • the first pose data in the SLAM map in the first coordinate system; the motion data includes motion speed data and motion direction data.
  • the electronic device is provided with an IMU
  • the input signal of the SLAM system includes the video image collected by the camera, the motion data collected by the IMU, and the SLAM map in the local coordinate system.
  • the IMU detects the angular velocity and linear acceleration of the electronic device at a high frequency, and integrates the angular acceleration and linear acceleration separately, and then can calculate the posture of the electronic device.
  • the video image collected by the camera is matched in the SLAM map in the local coordinate system, so that the position and posture of the electronic device can also be calculated. Then, based on the calculation of these two poses with a certain algorithm, the first pose data can be obtained.
  • the electronic device is also provided with a positioning module related to pose or movement (GPS positioning, Beidou positioning, WIFI positioning, or base station positioning, etc.), then the SLAM system can also refer to the camera collection
  • the first pose data is calculated by using the video images of, the motion data collected by the IMU, the SLAM map in the local coordinate system, and the data collected by the positioning module to further improve the accuracy of the first pose data.
  • the global sub-map of the electronic device in the second coordinate system is determined according to the K-th frame image and the global sub-map in the second coordinate system.
  • the second pose data in the map includes:
  • the image feature and the map feature are calculated to obtain the second pose data in the global sub-map of the electronic device in the second coordinate system.
  • the electronic device performs feature detection on the K-th frame image, and extracts the image position of the feature in the K-th frame image.
  • the feature detection algorithm is not limited to feature detection methods such as FAST, ORB, SIFT, SURF, D2Net, and SuperPoint. Then perform feature description for each detected feature.
  • the feature description algorithm is not limited to feature description methods such as ORB, SIFT, SURF, BRIEF, BRISK, FREAK, D2Net, SuperPoint, etc., to form a one-dimensional vector for subsequent feature matching .
  • the electronic device can match the most similar map content (such as one or more key frames) to the K-th frame image from the global submap.
  • Specific methods include, for example, traditional image retrieval methods such as BOW and VLAD, and New image retrieval methods of NetVLAD and AI.
  • BOW and VLAD traditional image retrieval methods
  • NetVLAD and AI New image retrieval methods of NetVLAD and AI.
  • PnP, EPnP, 3D-3D and other registration algorithms can be used to calculate Output the second pose data.
  • the embodiments of the present application can be implemented on the side of the electronic device, making full use of the computing power of the electronic device to calculate the first pose data and the second pose data, which improves the processing efficiency and can also reduce the calculation burden of the server.
  • the global sub-map of the electronic device in the second coordinate system is determined according to the K-th frame image and the global sub-map in the second coordinate system.
  • the second pose data in the map includes:
  • the global sub-map is determined by feature extraction and feature matching.
  • the Kth frame image may be the first frame image in the video image sequence taken by the camera.
  • the electronic device since the electronic device needs to download the global submap of the corresponding area first.
  • the process of downloading the map takes a certain amount of time.
  • the first global pose estimation can be done on the server side. In other words, after the application is started, the first global pose estimation is performed on the server side. While starting the global pose estimation, the server obtains the global sub-map and transmits it to the electronic device accordingly, which improves the user's access speed to the application. The user does not perceive the delay of the map downloading process. In this way, the user waiting caused by the download delay can be avoided, and the user experience is further improved.
  • the displaying the position and posture of the virtual object on the display component includes: displaying a first interface on the display component, and The interface displays a video stream and a virtual object; the position and posture of the virtual object relative to the video stream are displayed based on the posture data of the electronic device, and the posture data of the electronic device is at least based on the camera
  • the collected video image and the global sub-map are obtained by performing a pose calculation process.
  • the position and posture of the virtual object relative to the video stream for example, the position and posture of the virtual object superimposed on the video stream.
  • the position and posture of the virtual object superimposed on the video stream are displayed based on the posture data of the electronic device, and the posture data of the electronic device is based on at least the video stream collected by the camera and the posture data.
  • the global sub-map is obtained by performing pose calculation processing.
  • AR applications can use computer graphics and visualization techniques to generate virtual objects that do not exist in the real environment, and based on the current global pose of the electronic device, superimpose the virtual objects into the video stream in the viewfinder frame. That is, the position and posture of the virtual object superimposed on the video stream are displayed based on the posture data of the electronic device.
  • the pose data of the electronic device is obtained by performing pose calculation processing at least according to the video stream collected by the camera and the global submap.
  • the solution of the present application can also be applied to a VR scene (for example, applied to VR glasses).
  • the content displayed on the display screen may only be a virtual object without the video stream of the real environment.
  • the embodiments of the present application provide yet another virtual object display method, which can be applied to electronic devices with display components and cameras.
  • the electronic devices can be handheld terminals (such as mobile phones), VR or AR glasses, and drones. , Unmanned vehicles, etc.
  • the method includes: obtaining a global sub-map and storing it in the instant positioning and mapping (SLAM) system of the electronic device; the global sub-map is the position of the electronic device in the global map Corresponding sub-map; perform pose calculation according to the video image collected by the camera and the global sub-map to obtain the pose data of the electronic device; based on the pose data of the electronic device, in the display component
  • the virtual object is displayed on the upper side (or the position and posture of the virtual object are displayed on the display component).
  • the pose data of the electronic device may be the pose data in the first coordinate system (the local coordinate system of the SLAM map generated by the SLAM system), or the second coordinate system (the global coordinate corresponding to the global submap) Department of pose data under),
  • the electronic device can download the global sub-map from the server, collect the video image in the environment through the local camera, combine the collected video image and the downloaded global sub-map to determine the current pose of the electronic device, and then determine the current position and posture of the electronic device based on the electronic device’s
  • the current pose displays the position and posture of the virtual object on the display component.
  • the virtual object may correspondingly be a virtual object in a VR scene, an AR scene, or an MR scene (that is, an object in a virtual environment).
  • the display component of the electronic device may specifically include a display panel, and may also include lenses (for example, VR glasses) or a projection screen.
  • a global map is a high-precision map with a large geographic area.
  • the so-called "large geographic area” is relative to the geographic area represented by the SLAM map in the electronic device.
  • the global map can consist of one or Multiple SLAM maps generated by multiple electronic devices are integrated according to certain rules.
  • the global sub-map represents the sub-map corresponding to the location of the electronic device in the global map, that is, the actual location of the electronic device can be taken as the starting point to obtain the information in the preset area around the starting point.
  • the map content serves as the global sub-map.
  • the electronic device requests the server to download the global sub-map, and uses the global sub-map with higher accuracy than the SLAM map as the visual observation input to the SLAM system.
  • the SLAM system uses the global sub-map to estimate the position and pose of the electronic device. Effectively reduce or even eliminate the pose drift phenomenon that occurs in the long-term pose estimation of the SLAM system, thereby ensuring that the display position and direction of the virtual object on the electronic device will not be displayed during the long-term (for example, more than 1 minute) operation of the VR or AR or MR application Offset, display the virtual object correctly for a long time (for example, display the virtual object in a relatively accurate manner under the environment and duration represented by the video image), thereby improving the user experience.
  • the performing pose calculation according to the video image collected by the camera and the global submap to obtain the pose data of the electronic device includes: at least according to The video image collected by the camera and the global sub-map perform pose calculation at a first frequency to obtain the pose data of the electronic device.
  • the first frequency represents the frequency at which the SLAM system in the electronic device performs global pose estimation, that is, the frequency at which the global sub-map is called.
  • the first frequency may be less than or equal to the frequency at which the video stream is displayed on the display panel in value.
  • the value range of the first frequency may be 10 to 30 Hz, that is, 10 to 30 frames per second, that is, the frequency of calling the global submap may be a value in 10 to 30 frames.
  • the SLAM system of the electronic device can call the global sub-map at a relatively high fixed frequency to realize the posture tracking of the electronic device.
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the display component according to the global pose of the electronic device.
  • the position and posture of the virtual object will not jump. This is because, on the one hand, during this period of time, the SLAM system in the electronic device is based on the global sub-map to calculate the global pose of the electronic device, which can overcome the inaccuracy of the pre-built SLAM map in the electronic device and try to avoid it.
  • the accumulation of pose errors can overcome the occurrence of drift to the greatest extent; on the other hand, electronic devices can perform global pose estimation based on the global sub-map stably and at high frequency, which greatly reduces the sudden change of global pose; On the other hand, the process of calculating the global pose is completed on the side of the electronic device, the algorithm delay of pose estimation is low, and the effect of pose tracking is good. Therefore, this embodiment can display the virtual object for a long time and the virtual object will not be misaligned in the screen, eliminating the phenomenon of jumping of the virtual object caused by the sudden change of the pose, and further improving the user experience.
  • the pose data of the electronic device is the pose data of the electronic device in the first coordinate system, or the electronic device in the second coordinate system
  • the pose data; the first coordinate system is the coordinate system of the SLAM map of the SLAM system, and the second coordinate system is the coordinate system of the global submap.
  • the performing pose calculation based on the video image collected by the camera and the global submap to obtain the pose data of the electronic device includes: The video image collected by the camera, the global sub-map, and the movement data collected by the electronic device perform pose calculation to obtain the pose data of the electronic device, and the movement data includes movement speed data and movement direction data.
  • the motion data collected by the electronic device may be, for example, motion data collected by an inertial measurement unit (IMU) in the electronic device.
  • IMU inertial measurement unit
  • the introduction of the IMU further improves the accuracy of the estimated global pose and ensures the 3D features of the global submap Effectively act as a measurement value in the SLAM algorithm, and avoid the occurrence of pose drifting and jumping phenomena by performing pose estimation with high precision.
  • the method further includes: the electronic device updates the SLAM map of the SLAM system according to the posture data of the electronic device.
  • the method before the updating the SLAM map of the SLAM system according to the pose data of the electronic device, the method further includes: according to the Kth in the video images collected by the camera. Frame image and the SLAM map in the first coordinate system, determine the first pose data of the electronic device in the SLAM map in the first coordinate system; K is an integer greater than or equal to 1, according to the Kth frame Image and the global sub-map in the second coordinate system to determine the second pose data of the electronic device in the global sub-map in the second coordinate system; according to the first pose data and the second pose data Pose data to obtain coordinate system transformation information between the first coordinate system of the SLAM map and the second coordinate system of the global map; according to the coordinate system transformation information, the SLAM in the first coordinate system The map is transformed into the SLAM map in the second coordinate system; accordingly, the updating the SLAM map of the SLAM system according to the pose data of the electronic device includes: taking the pose data of the electronic device as the current The pose data
  • the embodiment of the application obtains the pose of the terminal in the local coordinate system and the pose in the global coordinate system respectively according to the same frame. Based on these two poses, the coordinate system transformation information between the two coordinate systems can be obtained (for example, Coordinate system transformation matrix). According to this coordinate system transformation matrix, the two coordinate systems can be synchronized. In this way, the information originally represented by the local coordinate system (such as local pose, image feature points, SLAM map 3D map points Etc.) Then it can be transformed to the global coordinate system based on the coordinate system transformation matrix. In this way, the pose and 3D map points in the SLAM system are expressed in the same coordinate system as the 3D map points in the global submap.
  • the coordinate system transformation information between the two coordinate systems can be obtained (for example, Coordinate system transformation matrix).
  • the two coordinate systems can be synchronized.
  • the information originally represented by the local coordinate system such as local pose, image feature points, SLAM map 3D map points Etc.
  • the pose and 3D map points in the SLAM system are
  • the 3D map points in the global sub-map can be used as the measurement value input of the SLAM system to realize the tight coupling between the global sub-map and the SLAM system, and then track the global pose of the electronic device in real time through pose estimation, which will effectively eliminate The drift of SLAM pose tracking.
  • the global pose of the electronic device can be used as the pose data in the SLAM map in the global coordinate system to update the SLAM map in the second coordinate system.
  • the acquiring the global sub-map of the global map includes: sending first location fingerprint information for indicating the initial location of the electronic device to the server; The global sub-map is received, the global sub-map corresponds to second location fingerprint information, and the first location fingerprint information matches the second location fingerprint information. Performing the map matching operation through the server improves the matching efficiency and accuracy, which in turn helps to reduce the delay in downloading the map.
  • the virtual object is a virtual object in a virtual reality VR scene, an augmented reality AR scene, or a mixed reality MR scene.
  • an embodiment of the present application provides an electronic device for displaying virtual objects, including: an interaction module, a data acquisition module, a communication module, and a SLAM module, where:
  • the interaction module is used to detect the operation of the user to open the application
  • the communication module is used to download a global sub-map in response to the operation and store it in the instant positioning and mapping (SLAM) module of the electronic device;
  • the global sub-map is the position of the electronic device in the global map The corresponding sub-map;
  • the interaction module is also used to display the position and posture of the virtual object on the display component.
  • the position and posture of the virtual object are executed by the SLAM module at least according to the video image collected by the data collection module and the global submap.
  • the pose is calculated.
  • the SLAM module may be the SLAM system described in the embodiment of the present application, for example, it may be the SLAM system 12 described in the following embodiments of this document.
  • the pose data of the electronic device is used to characterize the position and pose of the virtual object, and the pose data of the electronic device is determined by the SLAM module at least according to the The video image collected by the data collection module and the global sub-map are obtained by performing pose calculation at a first frequency.
  • the process of performing pose calculation by the SLAM module includes: according to the video image collected by the data collection module, the global submap, and the data collected by the data collection module The motion data performs pose calculation to obtain the pose data of the electronic device, and the motion data includes motion speed data and motion direction data.
  • the communication module is specifically configured to: in response to the operation, send the indication information of the initial position of the electronic device to the server; and receive the global information from the server.
  • a sub-map, the global sub-map is determined according to the initial position of the electronic device.
  • the indication information of the initial position of the electronic device includes first position fingerprint information for indicating the initial position of the electronic device; the global sub-map corresponds to the first position fingerprint information; Two location fingerprint information, and the first location fingerprint information matches the second location fingerprint information.
  • the SLAM module is further configured to update the SLAM map of the SLAM module according to the pose data of the electronic device.
  • the electronic device further includes a global positioning module and a coordinate system change matrix calculation module;
  • the SLAM module is specifically configured to determine the SLAM map of the electronic device in the first coordinate system according to the K-th frame image in the video image collected by the data acquisition module and the SLAM map in the first coordinate system
  • the first pose data in; K is an integer greater than or equal to 1;
  • the global positioning module is specifically configured to determine the second pose of the electronic device in the global sub-map in the second coordinate system according to the K-th frame image and the global sub-map in the second coordinate system data;
  • the coordinate system change matrix calculation module is specifically configured to obtain the first coordinate system and the position of the SLAM map according to the first pose data and the second pose data through the coordinate system change matrix calculation module The coordinate system transformation information between the second coordinate systems of the global map;
  • the SLAM module is further configured to transform the SLAM map in the first coordinate system into the SLAM map in the second coordinate system according to the coordinate system transformation information; use the pose data of the electronic device as The pose data in the SLAM map in the second coordinate system updates the SLAM map in the second coordinate system.
  • the SLAM module is specifically configured to: according to the K-th frame image, the SLAM map in the first coordinate system, and the motion data collected by the data collection module , Obtain the first pose data of the electronic device in the SLAM map in the first coordinate system; the motion data includes motion speed data and motion direction data.
  • the global positioning module is specifically configured to: determine, according to the K-th frame image and the global sub-map in the second coordinate system, that the electronic device is in the first coordinate system.
  • the second pose data in the global sub-map in the two-coordinate system includes: performing feature extraction according to the K-th frame image to obtain image features; and placing the image features in the global sub-map in the second coordinate system Perform feature matching in the image feature to obtain a map feature that matches the image feature; according to the image feature and the map feature, calculate and obtain the second position of the electronic device in the global submap in the second coordinate system Pose data.
  • the communication module is further configured to: send the K-th frame image to a server; receive the second pose data from the server, and the second position
  • the pose data is determined by the server by feature extraction and feature matching according to the K-th frame image and the global submap in the second coordinate system.
  • the interaction module is specifically configured to: display a first interface on the display component, and display a video stream and a virtual object on the first interface; the virtual object The position and posture relative to the video stream are displayed based on the posture data of the electronic device, and the posture data of the electronic device is based on at least the video image and the global image collected by the data collection module.
  • the sub-map is obtained by performing the pose calculation process.
  • an embodiment of the present application provides an electronic device for displaying virtual objects, including an interaction module, a data acquisition module, a communication module, and a SLAM module, where:
  • the global sub-map is a sub-map corresponding to the location of the electronic device in the global map;
  • the SLAM module is configured to perform pose calculation according to the video image collected by the data collection module and the global sub-map to obtain the pose data of the electronic device;
  • the interaction module is configured to display the virtual object on the display component based on the pose data of the electronic device.
  • the SLAM module may be the SLAM system described in the embodiment of the present application, for example, it may be the SLAM system 12 described in the following embodiments of this document.
  • the SLAM module is specifically configured to: perform pose calculation at a first frequency at least according to the video image collected by the data collection module and the global submap to obtain The pose data of the electronic device.
  • the pose data of the electronic device is the pose data of the electronic device in the first coordinate system, or the electronic device in the second coordinate system
  • the pose data; the first coordinate system is the coordinate system of the SLAM map of the SLAM module, and the second coordinate system is the coordinate system of the global submap.
  • the SLAM module is specifically configured to: perform position based on the video images collected by the data collection module, the global submap, and the motion data collected by the data collection module. Posture calculation to obtain posture data of the electronic device, and the motion data includes motion speed data and motion direction data.
  • the SLAM module is further configured to update the SLAM map of the SLAM module according to the pose data of the electronic device.
  • the electronic device further includes a global positioning module and a coordinate system change matrix calculation module;
  • the SLAM module is specifically configured to determine the SLAM map of the electronic device in the first coordinate system according to the K-th frame image in the video image collected by the data acquisition module and the SLAM map in the first coordinate system
  • the first pose data in; K is an integer greater than or equal to 1;
  • the global positioning module is specifically configured to determine the second pose of the electronic device in the global sub-map in the second coordinate system according to the K-th frame image and the global sub-map in the second coordinate system data;
  • the coordinate system change matrix calculation module is specifically configured to obtain the first coordinate system and the position of the SLAM map according to the first pose data and the second pose data through the coordinate system change matrix calculation module The coordinate system transformation information between the second coordinate systems of the global map;
  • the SLAM module is further configured to transform the SLAM map in the first coordinate system into the SLAM map in the second coordinate system according to the coordinate system transformation information; use the pose data of the electronic device as The pose data in the SLAM map in the second coordinate system updates the SLAM map in the second coordinate system.
  • the communication module is further configured to send first location fingerprint information for indicating the initial location of the electronic device to the server; and receive the global information from the server A sub-map, the global sub-map corresponds to second location fingerprint information, and the first location fingerprint information matches the second location fingerprint information.
  • the virtual object is a virtual object in a virtual reality VR scene, an augmented reality AR scene, or a mixed reality MR scene.
  • embodiments of the present application provide an electronic device for displaying virtual objects, including: a display component, a camera, one or more processors, a memory, one or more application programs, and one or more computer programs
  • the one or more computer programs are stored in the memory, the one or more computer programs include instructions, when the instructions are executed by the electronic device, the electronic device executes the first aspect or the first
  • the virtual object display method described in any of the possible implementations.
  • an embodiment of the present application provides an electronic device for displaying virtual objects, including: a display component, a camera, one or more processors, a memory, and one or more computer programs; the one or more A computer program is stored in the memory, and the one or more computer programs include instructions.
  • the electronic device may execute the second aspect or any one of the second aspects.
  • an embodiment of the present application provides a chip.
  • the chip includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any of the first aspects.
  • a virtual object display method in a possible implementation manner.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the virtual object display method in the first aspect or any possible implementation of the first aspect.
  • an embodiment of the present application provides a chip.
  • the chip includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the second aspect or any of the second aspect.
  • a virtual object display method in a possible implementation manner.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the virtual object display method in the second aspect or any possible implementation manner of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable medium stores program code for device execution, and the program code includes the program code for executing the first aspect or any one of the first aspect.
  • an embodiment of the present invention provides a computer program product.
  • the computer program product may be a software installation package.
  • the computer program product includes program instructions.
  • the The processor executes the method in any possible implementation of the foregoing first aspect or second aspect.
  • the electronic device requests the server to download the global sub-map, and uses the global sub-map with higher accuracy than the SLAM map as the visual observation input to the SLAM system.
  • the SLAM system of the electronic device can be fixed at a higher level. Frequently call the global sub-map to realize the posture tracking of the electronic device.
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the display component according to the global pose of the electronic device. And in the process of long-term pose update, the position and posture of the virtual object will not drift or jump.
  • FIG. 1 is a schematic diagram of an application architecture provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Figure 4 is a comparison diagram of a situation where pose drift occurs in an AR scene with the ideal situation
  • FIG. 5 is a system provided by an embodiment of the present application, and a schematic diagram of the structure of an electronic device and a server in the system;
  • FIG. 6 is another system provided by an embodiment of the present application, and a schematic diagram of the structure of electronic devices and servers in the system;
  • FIG. 7 is a schematic flowchart of a method for displaying a virtual object provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a scenario implemented by using the method of the present application provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of yet another scenario implemented by the method of this application provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of yet another scenario implemented by the method of this application provided by an embodiment of this application.
  • FIG. 11 is a schematic flowchart of yet another virtual object display method provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of yet another virtual object display method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a scene related to a coordinate system transformation matrix provided by an embodiment of the present application.
  • FIG. 14 is another system provided by an embodiment of the present application, and a schematic diagram of the structure of electronic devices and servers in the system;
  • FIG. 15 is a schematic flowchart of another virtual object display method provided by an embodiment of the present application.
  • the application architecture provided by the embodiment of the present application includes an electronic device 10 and a server 20.
  • the electronic device 10 and the server 20 can communicate.
  • the electronic device 10 can communicate via, for example, wireless-fidelity (Wifi). , Bluetooth communication, or cellular 2/3/4/5 generation (2/3/4/5 generation, 2G/3G/4G/5G) communication and other methods to communicate with the server 20.
  • Wi wireless-fidelity
  • Bluetooth communication or cellular 2/3/4/5 generation (2/3/4/5 generation, 2G/3G/4G/5G) communication and other methods to communicate with the server 20.
  • the electronic device 10 can be various types of devices equipped with cameras and display components.
  • the electronic device 10 can be a terminal device such as a mobile phone, a tablet computer, a notebook computer, a video recorder, etc. (as shown in Figure 1, the electronic device is a mobile phone as an example).
  • It can be equipment used for virtual scene interaction, such as VR glasses, AR equipment, MR interactive equipment, etc., it can be wearable electronic equipment such as smart watches, smart bracelets, and it can also be unmanned vehicles, drones and other vehicles In-vehicle equipment.
  • the embodiments of this application do not impose special restrictions on the specific form of the electronic device.
  • the electronic device 10 may also be referred to as user equipment (UE), subscriber station, mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, terminal device , Access terminal, mobile terminal, wireless terminal, smart terminal, remote terminal, handset, user agent, mobile client, client, or some other appropriate term.
  • UE user equipment
  • subscriber station mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, terminal device , Access terminal, mobile terminal, wireless terminal, smart terminal, remote terminal, handset, user agent, mobile client, client, or some other appropriate term.
  • the server 20 may specifically be one or more physical servers (for example, one physical server is exemplarily shown in FIG. 1), a computer cluster, or a virtual machine in a cloud computing scenario, and so on.
  • the electronic device 10 can install virtual scene applications such as VR or AR or MR applications, and can run the VR or AR or MR applications based on user operations (such as clicking, touching, sliding, shaking, voice control, etc.) .
  • the electronic device can collect a video image of any object in the environment through a local camera and/or sensor, and display the virtual object on the display component according to the collected video image.
  • the virtual object may correspondingly be a virtual object in a VR scene, an AR scene, or an MR scene (that is, an object in a virtual environment).
  • the virtual scene application in the electronic device 10 may be an application built in the electronic device itself, or may be an application provided by a third-party service provider installed by the user. limited.
  • the electronic device 10 is also equipped with an instant positioning and mapping (simultaneous localization and mapping, SLAM) system.
  • the SLAM system can create a map in a completely unknown environment, and use the map for autonomous positioning and pose ( Position and posture) determination, navigation, etc.
  • the map constructed by the SLAM system can be referred to as the SLAM map.
  • the SLAM map can be understood as a map drawn by the SLAM system according to the environmental information collected by the collection device.
  • the collection device can include, for example, an image collection device in an electronic device (such as a camera). (Or camera) and an inertial measurement unit (Inertial Measurement Unit, IMU).
  • the IMU may include sensors such as gyroscopes and accelerometers.
  • the SLAM map may contain the following map content: multiple key frames, triangulated feature points, and the association between key frames and feature points.
  • the key frame may be formed based on the image collected by the camera and the camera parameters used to generate the image (for example, the pose of the electronic device in the SLAM coordinate system).
  • the feature points may represent different 3D map points along the three-dimensional space in the SLAM map and the feature descriptions on the 3D map points. Each feature point can have an associated feature location. Each feature point can represent a 3D coordinate position and is associated with one or more descriptors.
  • Feature points can also be called 3D features, feature points, 3D feature points or other suitable names.
  • 3D map points represent the coordinates on the three-dimensional space axes X, Y, and Z.
  • the 3D map points represent the coordinates on the three-dimensional space axes X, Y, and Z of the local coordinate system. Coordinates on Y and Z.
  • the 3D map points represent the coordinates on the three-dimensional axis X, Y, and Z of the global coordinate system.
  • the coordinate system used to construct the SLAM map can be called the first coordinate system.
  • the first coordinate system in this article may also be called the local coordinate system, the SLAM coordinate system, the camera coordinate system or some other in some application scenarios.
  • Appropriate terminology For the convenience of understanding, the following text will mainly introduce the scheme under the name of "local coordinate system".
  • the pose embodied by the electronic device in the local coordinate system can be called a local pose.
  • the server 20 can be used as a platform for providing content and information support to the VR or AR or MR application of the electronic device 10.
  • a map is also stored in the server 20, and this map may be referred to as a global map for short.
  • the global map contains a larger area than the SLAM map in a single electronic device, and the map content is more accurate, and the global map is maintained and updated by the server.
  • the global map can be constructed offline in the server in advance.
  • the global map may be obtained by integrating multiple SLAM maps collected by one or more electronic devices according to certain rules.
  • the coordinate system used to construct the global map may be called the second coordinate system.
  • the second coordinate system may also be called the global coordinate system, the world coordinate system, or some other appropriate terminology.
  • the following text will mainly use the name "global coordinate system" to introduce the scheme.
  • the posture embodied by the electronic device in the global coordinate system can be referred to as the global posture.
  • FIG. 2 exemplarily shows a schematic structural diagram of the electronic device 10. It should be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 10. In other embodiments of the present application, the electronic device 10 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 10 may include: a chip 310, a memory 315 (one or more computer-readable storage media), a user interface 322, a display component 323, a camera 324, a positioning module 331 for device positioning, and a user interface For communication transceiver 332. These components may communicate on one or more communication buses 314.
  • the chip 310 may integrate: one or more processors 311, a clock module 312, and a power management module 313.
  • the clock module 312 integrated in the chip 310 is mainly used to provide the processor 311 with a timer required for data transmission and timing control, and the timer can realize the clock function of data transmission and timing control.
  • the processor 311 can perform operations according to the instruction operation code and timing signals, generate operation control signals, and complete the control of fetching instructions and executing instructions.
  • the power management module 313 integrated in the chip 310 is mainly used to provide a stable and high-precision voltage for the chip 310 and other components of the electronic device 10.
  • the processor 110 may also be referred to as a central processing unit (CPU, central processing unit).
  • the processor 110 may specifically include one or more processing units.
  • the processor 110 may include an application processor (AP). Tuning processor, graphics processing unit (GPU), image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor , And/or neural-network processing unit (NPU), etc.
  • AP application processor
  • Tuning processor graphics processing unit
  • GPU image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • NPU And/or neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter/receiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the memory 315 may be connected with the processor 311 through a bus, or may be coupled with the processor 311, and used to store various software programs and/or multiple sets of instructions.
  • the memory 315 may include a high-speed random access memory (such as a cache memory), and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage. equipment.
  • the memory 315 may store an operating system, such as an embedded operating system such as ANDROID, IOS, WINDOWS, or LINUX.
  • the memory 315 is also used to store related programs of the SLAM system.
  • the memory 315 is used to store data (for example, image data, point cloud data, map data, key frame data, pose data, coordinate system conversion information, etc.).
  • the memory 315 may also store a communication program, which may be used to communicate with one or more servers or other devices.
  • the memory 315 may also store one or more application programs. As shown in the figure, these applications may include: virtual scene applications such as AR/VR/MR, map applications, image management applications, and so on.
  • the memory 115 can also store a user interface program.
  • the user interface program can vividly display the content of the application program (such as virtual objects in virtual scenes such as AR/VR/MR) through a graphical operation interface and display it through the display component 323 Present, and realize the control operation of the application program received by the user through input controls such as menus, dialog boxes, and buttons.
  • the memory 315 may be used to store computer executable program code, and the executable program code includes instructions.
  • the user interface 322 may be, for example, a touch panel, through which the user's operation instructions on the touch panel can be detected, and the user interface 322 may also be a small keyboard, a physical button, or a mouse.
  • the electronic device 10 may include one or more display components 323.
  • the electronic device 10 may jointly implement a display function through the display component 323, a graphics processing unit (GPU) and an application processor (AP) in the chip 310, and so on.
  • the GPU is a microprocessor used for image processing, and is connected to the display component 323 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the display component 323 is used to display the interface content currently output by the system, such as displaying images and videos in virtual scenes such as AR/VR/MR.
  • the interface content can include the interface of the running application and the system-level menu, etc., which can be specified by the following The composition of the interface elements: input interface elements, such as buttons (Button), text input boxes (Text), scroll bars (Scroll Bar), menus (Menu), etc.; and output interface elements, such as windows, labels (Label), images, videos, animations, etc.
  • input interface elements such as buttons (Button), text input boxes (Text), scroll bars (Scroll Bar), menus (Menu), etc.
  • output interface elements such as windows, labels (Label), images, videos, animations, etc.
  • the display component 323 may be a display panel, lenses (for example, VR glasses), a projection screen, and the like.
  • the display panel may also be called a display screen, for example, it may be a touch screen, a flexible screen, a curved screen, etc., or other optical components. That is to say, when the electronic device has a display screen in this application, the display screen can be a touch screen, a flexible screen, a curved screen or other forms of screen.
  • the display screen of the electronic device has the function of displaying images. As for the specific material of the display screen and The shape is not limited in this application.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED active-matrix organic light emitting diodes
  • FLED flexible light-emitting diodes
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diodes, QLED
  • the touch panel in the user interface 322 and the display panel in the display assembly 323 can be coupled together to set up.
  • the touch panel can be set under the display panel, and the touch panel can be used to detect that the user passes through the display panel.
  • the touch pressure that acts on the display panel when a touch operation (such as click, slide, touch, etc.) is input from the panel, and the display panel is used for content display.
  • the camera 324 may be a monocular camera, a binocular camera, or a depth camera, and is used to photograph/video the environment to obtain an image/video image.
  • the image/video image collected by the camera 324 may be used as a kind of input data of the SLAM system, or the image/video image may be displayed through the display component 323, for example.
  • the camera 324 can also be regarded as a sensor.
  • the image collected by the camera 324 may be in IMG format or other format types, which is not limited here.
  • the sensor 325 can be used to collect data related to the state change (such as rotation, swing, movement, jitter, etc.) of the electronic device 10, and the data collected by the sensor 325 can be used as a kind of input data of the SLAM system, for example.
  • the sensor 325 may include one or more sensors, such as an Inertial Measurement Unit (IMU), a Time of Flight (TOF) sensor, and so on.
  • the IMU may further include sensors such as gyroscopes and accelerometers.
  • the gyroscope can be used to measure the angular velocity of the electronic device when it moves, and the accelerometer is used to measure the acceleration of the electronic device when it moves.
  • the TOF sensor can further include a light transmitter and a light receiver.
  • the light transmitter can be used to emit light, such as laser, infrared, and radar waves
  • the light receiver can be used to detect reflected light, such as reflected laser, infrared, and radar waves. Wait.
  • the sensor 325 may also include more other sensors, such as inertial sensors, barometers, magnetometers, wheel speedometers, and so on.
  • the positioning module 331 is used to implement physical positioning of the electronic device 10, for example, to obtain the initial position of the electronic device 10.
  • the positioning module 331 may include, for example, one or more of a WIFI positioning module, a Bluetooth positioning module, a base station positioning module, and a satellite positioning module.
  • the satellite positioning module can be equipped with a Global Navigation Satellite System (GNSS) to assist positioning.
  • GNSS is not limited to the Beidou system, GPS system, GLONASS system, and Galileo system.
  • the transceiver 332 is used to implement communication between the electronic device 10 and a server or other terminal devices.
  • the transceiver 332 integrates a transmitter and a receiver, which are used to send and receive radio frequency signals, respectively.
  • the transceiver 332 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card, a storage medium, etc.
  • the transceiver 332 may also be implemented on a separate chip.
  • the transceiver 332 may, for example, support data network communication through at least one of 2G/3G/4G/5G, etc., and/or support at least one of the following short-range wireless communication methods: Bluetooth (BT) communication, Wireless Fidelity (WiFi) communication, Near Field Communication (NFC), Infrared (IR) wireless communication, Ultra Wide Band (UWB) communication, ZigBee communication.
  • Bluetooth Bluetooth
  • WiFi Wireless Fidelity
  • NFC Near Field Communication
  • IR Infrared
  • UWB Ultra Wide Band
  • the processor 311 executes various functional applications and data processing of the electronic device 10 by running instructions stored in the memory 315. Specifically, the processor 311 may execute the method steps shown in the embodiment of FIG. 7 or may execute The functions of the electronic device side in the embodiment shown in Fig. 11, Fig. 12, or Fig. 15.
  • FIG. 3 is a structural block diagram of an implementation manner of a server 20 according to an embodiment of the present application.
  • the server 20 includes a processor 403, a memory 401 (one or more computer-readable storage media), and a transceiver 402. These components may communicate on one or more communication buses 404. among them:
  • the processor 403 may be one or more central processing units (CPU). In the case where the processor 403 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
  • the memory 401 may be connected with the processor 403 via a bus, or may be coupled with the processor 403 to store various software programs and/or multiple sets of instructions and data (for example, map data, pose data, etc.).
  • the memory 401 includes, but is not limited to, random access memory (RAM), read-only memory (Read-Only Memory, ROM), and erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), or portable read-only memory (Compact Disc Read-Only Memory, CD-ROM).
  • the transceiver 402 mainly integrates a receiver and a transmitter, where the receiver is used to receive data (such as requests, images, etc.) sent by the electronic device, and the transmitter is used to send data (such as map data, pose data, etc.) to the electronic device.
  • data such as requests, images, etc.
  • the transmitter is used to send data (such as map data, pose data, etc.) to the electronic device.
  • server 20 is only an example provided by the embodiment of the present application. In a specific implementation, the server 20 may have more components than shown in the figure.
  • the processor 403 may be used to call program instructions in the memory 401 to perform server-side functions in the embodiment shown in FIG. 11, FIG. 12, or FIG.
  • Coupled means directly connected to or connected through one or more intervening components or circuits. Any signal provided on the various buses described herein can be time-multiplexed with other signals and provided on one or more shared buses.
  • interconnection between various circuit elements or software blocks may be shown as a bus or a single signal line. Each bus may alternatively be a single signal line, and each single signal line may alternatively be a bus, and a single wire or bus may represent any one or more of a large number of physical or logical mechanisms for communication between components .
  • the position/posture of the virtual object presented by the electronic device in real time on the display component is closely related to the position/posture of the electronic device itself, that is, the position/posture of the electronic device itself determines The shape, size, content, etc. of virtual objects presented. Therefore, it is necessary to track the posture of the electronic device.
  • the SLAM system extracts features from the input image, and obtains an estimated pose based on the built SLAM map, and uses the acceleration, angular velocity and other information collected by the IMU for processing to obtain another Two estimated poses, and finally the two estimated poses are integrated through a certain algorithm to obtain the final pose of the electronic device.
  • the electronic device in the existing solution transmits the current image to the server, and the server calculates the position of the electronic device in the global coordinate system based on the current image and transmits it back to the electronic device (intermediate delay such as 2 seconds), the electronic device Calculate and update the coordinate transformation matrix between the global coordinate system and the local coordinate system.
  • the currently calculated coordinate system transformation matrix will be significantly different from the previously calculated coordinate system transformation matrix. Therefore, when the SLAM system uses the currently calculated coordinate system transformation matrix to update the local pose , The local posture will change significantly, causing the virtual objects presented to be bounced.
  • FIG. 4 shows a scene where a pose shift occurs and an ideal scene.
  • 104 represents the real image in the environment
  • the navigation instructions 105, the storefront instructions 106, and the interactive robot 107 are all virtual objects generated using graphics technology and visualization technology.
  • the posture and posture of these virtual objects in the screen can be calculated, so that these virtual objects are integrated with the real environment image to present the user with a real sensory experience.
  • the SLAM system exhibits pose drift, these virtual objects will appear unreasonably shifted in the screen, resulting in distortion of the fusion effect of the virtual object and the real environment image.
  • the SLAM system corrects the drift phenomenon, these virtual objects in the screen jump from an unreasonable offset state to a normal ideal state, so the user will see the phenomenon of "virtual object jumping", which also brings the user Bad experience.
  • FIG. 5 is a structural block diagram of a system provided by an embodiment of the present application.
  • the system includes an electronic device 10 and a server 20, and the electronic device 10 and the server 20 can communicate with each other through respective transceivers.
  • the electronic device 10 is configured with a SLAM system 12, a data acquisition module 13, and an interaction module 14 and the communication module 11, the SLAM system 12, the data acquisition module 13, the interaction module 14, and the communication module 11 can exist in the form of software codes.
  • the data/programs of these functional modules can be stored in Figure 2
  • the memory 315 shown can be run on the processor 311 shown in FIG. 2. among them:
  • the communication module 11 can use the transceiver 332 shown in FIG. 2 to communicate with the server 20. Specifically, the communication module 11 is configured to obtain a global sub-map from the server 20, and the global sub-map is stored by the server 20. The sub-map corresponding to the position information of the electronic device 10 in the global map. And the global sub-map is stored in the database 123 of the global sub-map in the SLAM system 12 of the electronic device 10.
  • the data acquisition module 13 is configured to use the sensor 325 shown in FIG. 2 to obtain state data of the electronic device 10, the camera 324 shown in FIG. 2 to obtain video images, and the positioning module 331 shown in FIG. 2 To get the location of the electronic device.
  • the interaction module 14 is configured to use the user interface 322 shown in FIG. 2 to realize the detection and acquisition of user operations, and to use the display component 323 shown in FIG. 2 to realize the display of images/videos/virtual objects, such as AR/VR/ Display of application content such as MR.
  • the calculation module 121 in the SLAM system 12 is configured to perform pose calculation according to the video image collected by the camera 324 and the downloaded global submap to obtain the pose data of the electronic device 10; the interaction module 14 may be based on the position of the electronic device 10. Pose data to display virtual objects on the display component.
  • the SLAM map constructed by the SLAM system 12 itself is stored in the SLAM map database 122, and the SLAM system is also configured to update the SLAM map in the database 122 based on the pose data of the electronic device 10.
  • the functional modules in the electronic device 10 can cooperate with each other to execute the steps in the method shown in the embodiment of FIG. 7, or execute the steps on the electronic device side in the embodiment of FIG. 11, FIG. 12, or FIG. Features.
  • the server 20 is configured with a communication module 21, a processing module 22, and a database 23 of the global map.
  • the communication module 21, the processing module 22, and the database 23 of the global map may exist in the form of software code.
  • the data/programs of these functional modules can be stored in the memory 401 as shown in FIG. 3 and can be run on the processor 403 as shown in FIG. 3. among them:
  • the database 23 of the global map is used to store, maintain and update the global map.
  • the processing module 22 may be configured to obtain a sub-map corresponding to the location information of the electronic device 10 from the global map stored in the database 23 based on the location information of the electronic device 10, that is, a global sub-map.
  • the communication module 21 can use the transceiver 402 as shown in FIG. 3 to realize communication with the electronic device 10. Specifically, the communication module 21 may send the global sub-map to the electronic device 10.
  • the functional modules in the server 20 can cooperate with each other to perform server-side functions in the embodiment shown in FIG. 11, FIG. 12, or FIG. 15.
  • FIG. 6 shows the components (or sub-modules) that may be further included in each functional module in the electronic device 10 shown in FIG. 5 in a specific implementation, and the functional modules in the server 20 are further The components (or sub-modules) that may be included.
  • the functional modules in the electronic device 10 for example, the SLAM system, the data acquisition module 13, the interactive module 14
  • the function modules in the server 20 for example, the processing module 22
  • the components (or sub-modules) are only examples of the embodiments of the present application. In other embodiments, the above-mentioned functional modules may also include more or fewer components (or sub-modules).
  • the calculation module 121 in the SLAM system 12 further includes a mapping module 1211, a pose estimation module 1212, a pose estimation module 1212, a feature processing module 1213, and a closed-loop correction module.
  • the electronic device 10 also includes a global positioning module 16 and a software development kit (Software Development Kit, SDK).
  • the global positioning module 16 further includes an image retrieval module 161, a feature extraction module 162, a feature matching module 163, and a pose estimation module 164.
  • the SDK may include a database for the global pose and the local pose respectively, and the coordinate system change matrix calculation module 15.
  • the SDK may also call the interaction module 14 to realize display through display components.
  • the feature processing module 1213 may be used for operations related to visual feature processing.
  • the feature processing module 1213 may further include a feature extraction module (not shown) and a feature matching module (not shown).
  • the feature extraction module includes feature detection function and feature description function.
  • the feature detection function is to extract the image position of the feature in the image.
  • the feature description function is to describe each detected feature to form a one-dimensional vector for feature matching The features of the module are being matched.
  • the data acquisition module 13 (such as an IMU sensor) can output high-frequency angular velocity and linear acceleration.
  • the pose estimation module 1212 integrates the angular acceleration and linear acceleration separately, and combines the video images taken by the camera to estimate the pose and calculate the electronic equipment Position and posture.
  • the result of pose estimation can be output as SLAM system.
  • the result of the pose estimation can also be used as the input of the mapping module 1211.
  • the mapping module 1211 creates an environment map perceivable by the SLAM system under the local coordinate system, that is, the SLAM map. As the SLAM system continues to operate in the space, it is constantly creating/updating SLAM maps.
  • the closed-loop correction module 1214 can be used to reduce the errors that may be accumulated in the SLAM map.
  • the SLAM map can also be used as the input of the pose estimation module 1212 to estimate the pose, so as to improve the accuracy of the pose estimation.
  • the database 123 stores a global sub-map downloaded from the server 10, and the pose estimation module 1212 is configured to perform execution based on the global sub-map and the video images collected by the camera 324 and the motion data collected by the sensor 325.
  • the pose calculation is used to obtain the pose data of the electronic device 10, that is, to obtain the global pose of the electronic device 10, so as to realize the tracking and positioning of the pose of the electronic device 10.
  • the movement data includes movement speed data and movement direction data of the electronic device 10, such as acceleration and angular velocity.
  • the feature processing module 1213 can be used to extract 2D features of the video image, and the pose estimation module 1212 can be based on the 2D features of the video image, the 3D map points of the global submap, and the movement collected by the data collection module 13 (for example, IMU). Data to obtain the global pose of the electronic device 10 in the global sub-map.
  • the data collection module 13 for example, IMU
  • the pose estimation module 1212 is configured to track the global pose of the electronic device 10 at a relatively high first frequency.
  • the first frequency represents the frequency at which the SLAM system performs global pose estimation, that is, the SLAM system
  • the frequency of calling the global submap in the database 123, the first frequency may be less than or equal to the frequency of displaying the video stream of the display component in value.
  • the first frequency is 30 frames per second.
  • the pose estimation module 1212 can call the global sub-map at a relatively high fixed frequency to realize the pose tracking of the electronic device 10. In this way, after the global pose of the electronic device 10 is obtained in real time, the position and pose of the virtual object in the virtual scene such as AR/VR/MR can be displayed on the display component according to the global pose of the electronic device 10.
  • the pose estimation module 1212 of the SLAM system can also be used to determine the local pose of the electronic device 10 in the SLAM map in the local coordinate system according to the K-th frame image in the video image sequence collected by the camera and the SLAM map in the local coordinate system.
  • the local pose here can be called the first pose data
  • K is an integer greater than or equal to 1.
  • the feature processing module 1213 can be used to extract 2D features in the Kth frame image, and the pose estimation module 1212 can be based on the 2D features in the Kth frame image and the 3D map points of the SLAM map in the local coordinate system.
  • the movement data collected by the data collection module 13 to obtain the first pose data in the SLAM map of the electronic device 10 in the local coordinate system;
  • the movement data includes the movement speed data and the movement of the electronic device 10
  • Directional data such as acceleration, angular velocity, etc.
  • the global pose module 16 is used to determine the global pose of the electronic device 10 in the global sub-map (
  • the global pose here can be called the second pose data)
  • K is an integer greater than or equal to 1.
  • the image retrieval module 161 can obtain the K-th frame image in the video image sequence
  • the feature extraction module 162 performs feature extraction according to the K-th frame image to obtain the image feature
  • the feature matching module 163 combines the image feature Perform feature matching in the global sub-map to obtain map features that match the image features
  • the pose estimation module 164 calculates and obtains the position of the electronic device 10 in the global sub-map according to the image features and the map features.
  • the second pose data can be used to determine the global pose of the electronic device 10 in the global sub-map (
  • the global pose here can be called the second pose data
  • K is an integer greater than or equal to 1.
  • the image retrieval module 161 can obtain the K-th frame image in the video image sequence
  • the feature extraction module 162 performs feature extraction according to the K-
  • the first pose data (local pose) and the second pose data (global pose) can be stored in the database in the SDK respectively.
  • the coordinate system change matrix calculation module 15 may be configured to calculate the coordinates between the local coordinate system of the SLAM map and the global coordinate system of the global map according to the first pose data and the second pose data.
  • System transformation information for example, coordinate system transformation matrix. And the coordinate system transformation information is fed back to the SLAM system.
  • the mapping module 1211 in the SLAM system may be configured to update the SLAM map in the database 122 of the SLAM system according to the pose data of the electronic device 10. Specifically, the mapping module 1211 may transform the SLAM map to the global coordinate system according to the coordinate system transformation information, and use the global pose of the electronic device 10 as the pose data in the SLAM map in the global coordinate system to update the SLAM map in the global coordinate system.
  • the functional modules in the electronic device 10 can cooperate with each other to execute the steps in the method shown in the embodiment of FIG. 7 or execute the functions on the electronic device side in the embodiments of FIG. 11 and FIG. 12.
  • the processing module 22 in the server 20 further includes a sub-map processing module 221 and a location fingerprint matching module 222.
  • the location fingerprint matching module 222 is configured to send the electronic device 10 according to the location fingerprint information of the initial location ( This may be referred to as the first location fingerprint information), and the location fingerprint information that matches the first location fingerprint information is searched for in the global map (here, it may be referred to as the second location fingerprint information).
  • the sub-map processing module 221 is configured to retrieve the global sub-map with the second location fingerprint information from the global map in the database 23.
  • the global sub-map can also be additionally stored in the database of the server 20 for quick access next time.
  • the server 20 may send the global sub-map with the second location fingerprint information to the electronic device 10.
  • the functional modules in the server 20 can cooperate with each other to perform the server-side functions in the embodiments of FIG. 11 and FIG. 12.
  • FIG. 7 is a schematic flowchart of a virtual object display method provided by an embodiment of the present application.
  • the method can be applied to an electronic device with a display component and a camera. The method includes but is not limited to the following steps:
  • the global sub-map is a sub-map corresponding to the location of the electronic device in the global map.
  • S1013. Display the position and posture of the virtual object on the display component of the electronic device.
  • the position and posture of the virtual object are obtained by performing pose calculation at least according to the video image collected by the camera and the global submap.
  • the user inputs an operation for opening an application (APP) on the electronic device, such as clicking, touching, sliding, shaking, voice control, etc., in response to the operation, on the one hand, the display of the electronic device
  • the application interface is displayed on the component (such as the display panel or the lens), and on the other hand, the process of downloading the global submap from the server or other devices (such as other terminal devices or hard disks, USB and other storage media) is started.
  • the application may be an application such as AR/VR/MR installed in an electronic device.
  • FIG. 8 shows a graphical user interface (GUI) on the display panel of the electronic device 10, and the GUI is the desktop 101 of the electronic device.
  • GUI graphical user interface
  • the electronic device detects that the user has clicked the icon 102 of the virtual scene application on the desktop 101, on the one hand, it starts the process of downloading the global sub-map in the background; on the other hand, it displays on the display panel after starting the virtual scene application.
  • GUI graphical user interface
  • the user interface 103 of the GUI takes the AR navigation interface as an example, and the user interface 103 may also be referred to as the first interface.
  • the user interface 103 may include a viewing frame 104.
  • the viewing frame 104 can display the preview video stream of the real environment where the electronic device 10 is located in real time.
  • the preview video stream is captured by the camera of the electronic device 10.
  • virtual objects of the AR application are also superimposed.
  • the number of virtual objects can be one or more.
  • the virtual objects exemplify navigation instructions 105, storefront instructions 106, and interactive robots 107.
  • the navigation instructions 105 can pass through the azimuth arrows.
  • the storefront instruction 106 can accurately and accurately display the type name of the store that appears explicitly or implicitly in the video stream indicating the viewfinder 104 in real time.
  • the interactive robot 107 can be used to implement voice conversations. , Voice introduction, or just as an interesting display on the street, etc.
  • the pose calculation is performed according to the obtained global sub-map and the video image captured by the camera in real time, so as to obtain the current pose of the electronic device 10 in the global coordinate system in real time.
  • Data ie global pose.
  • the position and posture of the virtual object in the AR scene can be determined by the global posture of the electronic device 10, that is, the global posture of the electronic device represents the position and posture of the virtual object. Therefore, the position and posture of the virtual object in the AR scene can be displayed in the viewing frame 104 based on the global pose of the electronic device 10 in real time.
  • AR applications can use computer graphics and visualization techniques to generate virtual objects that do not exist in the real environment, and based on the current global pose of the electronic device 10, superimpose the virtual objects into the viewing frame 104, thereby placing the virtual objects in the viewing frame 104 It is superimposed on the video stream of the view frame 104.
  • the AR application sends the video image to the AR cloud server to request the AR cloud server to obtain the object to be rendered corresponding to the video image.
  • the object to be rendered may include the object to be rendered. Identification and/or information such as the name and metadata of the object to be rendered.
  • the AR cloud server sends the object to be rendered corresponding to the video image to the electronic device 10.
  • the electronic device 10 determines the business rule corresponding to the object to be rendered, and uses the business rule of each object to be rendered to render the corresponding object to be rendered, thereby Multiple one or more AR objects (ie virtual objects) are generated, and based on the global pose of the electronic device 10, the virtual objects are superimposed on the video stream of the view frame 104.
  • the global pose of the electronic device 10 is obtained by the electronic device 10 at least according to the video image collected by the camera of the electronic device 10 and the global sub-map by performing the pose calculation at a relatively high first frequency.
  • a frequency represents the frequency at which the SLAM system in the electronic device 10 performs global pose estimation, that is, the frequency at which the global sub-map is invoked.
  • the first frequency may be less than or equal to the frequency at which the video stream is displayed on the display panel in value.
  • the SLAM system of the electronic device 10 can call the global sub-map at a relatively high fixed frequency to realize the posture tracking of the electronic device 10. In this way, after the global pose of the electronic device 10 is obtained in real time, the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the viewing frame 104 according to the global pose of the electronic device 10.
  • the electronic device 10 can continue to accurately superimpose the virtual object on the video in the viewfinder 104 In the stream.
  • the video stream 104 of the view frame 104 can still accurately display the navigation instructions 105, the storefront instructions 106, and the interactive robot 107.
  • the video stream 104 of the view frame 104 can still accurately display the navigation instructions 105, the storefront instructions 106, and the interactive robot 107.
  • the SLAM system in the electronic device 10 calculates the pose of the electronic device 10 based on the global sub-map, which can overcome the inaccuracy of the pre-built SLAM map in the electronic device 10.
  • the problem is to avoid the accumulation of pose errors as much as possible, so as to overcome the occurrence of drift to the greatest extent; on the other hand, the electronic device 10 can perform global pose estimation based on the global sub-map stably and at high frequency, avoiding the global pose In this way, the occurrence of bounce can be overcome to the greatest extent, and the user experience can be improved.
  • the electronic device 10 when the electronic device 10 is about to move out of the geographic range of the global sub-map, it can request to download a new global sub-map in advance based on the location of the electronic device 10, and the subsequent electronic device 10 can use the new global sub-map to download Performing the global pose estimation can further avoid the occurrence of sudden changes in the pose when the geographic ranges corresponding to the two global submaps are switched, and further improve the user experience.
  • the process from the user clicking on the virtual scene application to the display panel presenting the AR screen can also be implemented in the following manner:
  • a certain virtual scene application (for example, a navigation map application with AR function) is installed in the electronic device 10.
  • FIG. 9 shows the electronic device 10 The desktop 101 on the display panel.
  • the electronic device detects that the user has clicked the icon 102 of the virtual scene application on the desktop 101, it displays a user interface 108 as shown in (b) in FIG. 9 on the display panel of the electronic device 10, and the user interface 108 includes A text box for entering the account number and password to prompt the user to log in to the application by verifying their identity.
  • the user interface of the application as shown in (c) in FIG. 9 is displayed on the display panel 103.
  • the user interface 103 may include a viewfinder frame 104.
  • the viewing frame 104 can display the preview video stream of the real environment where the electronic device 10 is located in real time.
  • the preview video stream is captured by the camera of the electronic device 10.
  • a preset navigation input box may also be superimposed on the preview video stream, so as to facilitate input of the origin and destination for AR navigation.
  • starting the process of downloading the global submap in the background may consume a certain amount of time (for example, 2 seconds), but because the user is currently presented with the real scene of the video stream, it also requires the user to spend some time inputting AR The origin and destination of the navigation, so the user will not perceive the existence of the map downloading process. In this way, the user waiting caused by the download delay can be avoided, and the user experience can be further improved.
  • a certain amount of time for example, 2 seconds
  • the electronic device 10 can perform pose calculation in the background processor according to the obtained global sub-map and the video image captured by the camera in real time, so as to obtain in real time that the electronic device 10 is currently in the global coordinate system.
  • pose data ie global pose
  • the position and posture of the virtual object in the AR scene can be displayed in the viewing frame 104 based on the global pose of the electronic device 10 in real time.
  • the virtual objects are illustratively taken as examples of navigation instructions 105, storefront instructions 106, and interactive robots 107.
  • the SLAM system of the electronic device 10 can call the global sub-map at a relatively high fixed frequency to realize the posture tracking of the electronic device 10.
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the viewing frame 104 according to the global pose of the electronic device 10, which can overcome the drift phenomenon to the greatest extent. And the occurrence of beating phenomenon.
  • the process from the user clicking on the virtual scene application to the display panel presenting the AR screen can also be implemented in the following manner:
  • a certain virtual scene application (for example, a navigation map application with AR function) is installed in the electronic device 10.
  • FIG. 10 shows the electronic device 10 The desktop 101 on the display panel.
  • the electronic device detects that the user has clicked the icon 102 of the virtual scene application on the desktop 101, it displays a user interface 108 of the application as shown in (b) of FIG. 9 on the display panel of the electronic device 10, where
  • the user interface 108 includes an electronic map interface and multiple controls 109, such as an electronic map control, a satellite map control, and an AR control as shown in the figure.
  • the user interface 103 of the application as shown in (c) in FIG. 10 is displayed on the display panel.
  • the user The interface 103 may include a viewing frame 104.
  • the viewing frame 104 can display the preview video stream of the real environment where the electronic device 10 is located in real time.
  • the preview video stream is captured by the camera of the electronic device 10.
  • a preset navigation input box may also be superimposed on the preview video stream, so as to facilitate input of the origin and destination for AR navigation.
  • the electronic device 10 can perform pose calculation in the background processor according to the obtained global sub-map and the video image captured by the camera in real time, so as to obtain in real time that the electronic device 10 is currently in the global coordinate system.
  • pose data ie global pose
  • the position and posture of the virtual object in the AR scene can be displayed in the viewing frame 104 based on the global pose of the electronic device 10 in real time.
  • the virtual objects are illustratively taken as examples of navigation instructions 105, storefront instructions 106, and interactive robots 107.
  • the SLAM system of the electronic device 10 can call the global sub-map at a relatively high fixed frequency to realize the posture tracking of the electronic device 10.
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the viewing frame 104 according to the global pose of the electronic device 10, which can overcome the drift phenomenon to the greatest extent. And the occurrence of beating phenomenon.
  • FIG. 11 is a schematic flowchart of another virtual object display method provided by an embodiment of the present application, which is described separately from the electronic device and the server side.
  • the method includes but is not limited to the following steps:
  • the server issues a global sub-map to the electronic device based on the request of the electronic device.
  • the electronic device receives the global sub-map.
  • the global sub-map is a sub-map corresponding to the position of the electronic device in the global map.
  • the electronic device stores the global sub-map in the SLAM system of the electronic device.
  • the electronic device performs pose calculation at a first frequency according to the video image collected by the camera and the global sub-map, so as to continuously update the global pose of the electronic device.
  • the first frequency represents the frequency at which the SLAM system of the electronic device performs global pose estimation, that is, the frequency at which the SLAM system calls the global submap.
  • the first frequency may be less than or equal to the frequency at which the display component displays the video stream.
  • the first frequency is 30 frames per second.
  • the SLAM system can call the global sub-map at a relatively high fixed frequency to realize the posture tracking of the electronic device. In this way, after the global pose of the electronic device is obtained in real time, the position and pose of the virtual object in the virtual scene such as AR/VR/MR can be displayed on the display component according to the global pose of the electronic device.
  • the global pose of the electronic device is used to indicate the position and posture (azimuth) of the electronic device in the global coordinate system.
  • the position can be represented by three coordinate axes x, y, and z
  • the posture (azimuth) can be represented by ( ⁇ , ⁇ , ⁇ ) to indicate, ( ⁇ , ⁇ , ⁇ ) to indicate the angle of rotation around the three coordinate axes.
  • a camera is provided in the electronic device.
  • the input signal of the SLAM system includes the video image collected by the camera and the global sub-map.
  • the SLAM system can use the video image collected by the camera in Matching is performed in the global sub-map, so as to calculate and obtain the global pose of the electronic device in the global sub-map.
  • an inertial measurement unit is provided in the electronic device.
  • the input signal of the SLAM system includes the video image collected by the camera, the motion data collected by the IMU, and the global Sub-map.
  • the IMU detects the angular velocity and linear acceleration of the electronic device at a high frequency, and integrates the angular acceleration and linear acceleration separately, and then can calculate the pose of the electronic device (for example, the pose here may be referred to as the pose measured by the IMU).
  • the video image collected by the camera is matched in the global sub-map, so that the pose of the electronic device can also be calculated (for example, the pose here can be referred to as the measured pose of the image). Then, based on the joint operation of the pose measured by the IMU and the pose measured by the image, a more accurate final pose can be obtained, which is used as the global pose of the electronic device in the global sub-map.
  • the electronic device in addition to the camera and IMU, is also provided with a positioning module related to pose or movement (GPS positioning, Beidou positioning, WIFI positioning, or base station positioning, etc.).
  • a positioning module related to pose or movement GPS positioning, Beidou positioning, WIFI positioning, or base station positioning, etc.
  • the SLAM system also The video images collected by the camera, the motion data collected by the IMU, the global sub-map and the data collected by the positioning module can be collectively referred to to calculate and obtain the global pose of the electronic device in the global sub-map.
  • the electronic device displays the virtual object on the display component based on the global pose of the electronic device.
  • the IMU signal and the image signal are used as the input of SLAM to estimate the process of the camera.
  • This process includes SLAM map data as input for pose estimation.
  • the SLAM map can reduce the long-term drift of the SLAM map after the internal closed-loop correction module, there are still large errors. Therefore, after downloading the sub-map, this patent uses the sub-map as the input of the pose estimation module, which has the same effect on the SLAM map as the pose estimation. However, the accuracy of the sub-map is more accurate than that of the SLAM map.
  • the input of the pose estimation the long-term drift of the pose estimation can be eliminated, and the long-term drift and the jitter phenomenon of the SLAM map can also be eliminated.
  • the electronic device requests the server to download the global sub-map, and uses the global sub-map with higher accuracy than the SLAM map as the visual observation input to the SLAM system.
  • the SLAM system of the electronic device can be fixed at a higher level. Frequently call the global sub-map to realize the posture tracking of the electronic device.
  • the position and pose of the virtual object in the AR scene can be displayed and updated in real time in the display component according to the global pose of the electronic device. And in the process of long-term pose update, the position and posture of the virtual object will not drift or jump.
  • the SLAM system in the electronic device is based on the global sub-map to calculate the global pose of the electronic device, which can overcome the inaccuracy of the pre-built SLAM map in the electronic device and try to avoid the pose error.
  • Accumulation to overcome the occurrence of drift phenomenon to the greatest extent; on the other hand, electronic devices can perform global pose estimation based on the global sub-map stably and frequently, which greatly reduces the sudden change of global pose; on the other hand, The process of calculating the global pose is completed on the side of the electronic device, the algorithm delay of pose estimation is low, and the effect of pose tracking is good. Therefore, this embodiment can display the virtual object for a long time and the virtual object will not be misaligned in the screen, eliminating the phenomenon of jumping of the virtual object caused by the sudden change of the pose, and further improving the user experience.
  • FIG. 12 is a schematic flowchart of another virtual object display method provided by an embodiment of the present application, which is described separately from the electronic device and the server side.
  • the method includes but is not limited to the following steps:
  • the electronic device sends first location fingerprint information used to indicate the initial location of the electronic device to the server.
  • the electronic device uploads location fingerprint information to the server.
  • the initial location may be the location information of the electronic device when the electronic device requests to download the map.
  • the source of the location fingerprint information includes GNSS/WiFi/Bluetooth/ The initial position information, signal strength information, signal characteristic information, etc. measured by the base station and other positioning methods; it may also be the position information input by the user.
  • the server obtains a global submap matching the fingerprint information of the first location from the global map.
  • the global sub-map is a sub-map corresponding to the location of the electronic device in the global map.
  • the server delivers the global sub-map to the electronic device.
  • the electronic device receives the global sub-map.
  • the server matches the location fingerprint information of the submap stored in the database in advance by the server according to the above-mentioned first location fingerprint information, and the submaps in the database are submaps belonging to the global map. If there is a matching sub-map, the matching sub-map is transmitted to the electronic device.
  • the server traverses the global map saved by the server according to the above-mentioned first location fingerprint information until it finds an area matching the location fingerprint information, and the server takes the area from the global map as a global sub-map, and uses the global The sub-map is transmitted to the electronic device.
  • the electronic device stores the global sub-map in the SLAM system of the electronic device.
  • the electronic device calculates coordinate system transformation information, and transforms the SLAM map based on the coordinate system transformation information.
  • the process of the electronic device calculating coordinate system transformation information can be described as follows: the electronic device acquires the K-th frame image in the video image sequence collected by the camera, where K is an integer greater than or equal to 1; then, on the one hand, according to the K-th frame
  • the image and the SLAM map determine the local pose of the electronic device in the SLAM map under the originally constructed local coordinate system (here it may be referred to as the first pose data).
  • the electronic device is provided with an IMU
  • the input signal of the SLAM system includes the video image collected by the camera, the motion data collected by the IMU, and the SLAM map in the local coordinate system.
  • the IMU detects the angular velocity and linear acceleration of the electronic device at a high frequency, and integrates the angular acceleration and linear acceleration separately, and then can calculate the posture of the electronic device.
  • the video image collected by the camera is matched in the SLAM map in the local coordinate system, so that the position and posture of the electronic device can also be calculated. Then, based on the calculation of these two poses with a certain algorithm, the first pose data can be obtained.
  • the electronic device is also equipped with a positioning module related to pose or movement (GPS positioning, Beidou positioning, WIFI positioning, or base station positioning, etc.), then the SLAM system can also refer to the camera collection
  • the first pose data is calculated by using the video image of the video image, the motion data collected by the IMU, the SLAM map in the local coordinate system, and the data collected by the positioning module.
  • feature extraction of the K-th frame image and feature matching with the global sub-map can be performed to determine the global pose of the electronic device in the global sub-map (here can be Called the second pose data).
  • the electronic device performs feature detection on the K-th frame image, and extracts the image position of the feature in the K-th frame image.
  • the feature detection algorithm is not limited to feature detection methods such as FAST, ORB, SIFT, SURF, D2Net, and SuperPoint. Then perform feature description for each detected feature.
  • the feature description algorithm is not limited to feature description methods such as ORB, SIFT, SURF, BRIEF, BRISK, FREAK, D2Net, SuperPoint, etc., to form a one-dimensional vector for subsequent feature matching .
  • the electronic device can match the most similar map content (such as one or more key frames) to the K-th frame image from the global submap.
  • Feature matching specifically calculates the degree of similarity between two feature descriptions. For Float-type vectors, it can be calculated by Euclidean distance; for binary vectors, it can be calculated by XOR. After finding the most similar map content to the K-th frame image, you can estimate the pose based on the K-th frame image and the most similar map content. For example, PnP, EPnP, 3D-3D and other registration algorithms can be used to calculate Output the second pose data.
  • the electronic device can obtain coordinate system transformation information between the first coordinate system of the SLAM map and the second coordinate system of the global map according to the first pose data and the second pose data,
  • the coordinate system transformation information may be, for example, a coordinate system transformation matrix.
  • G T L can represent the coordinate system transformation matrix between the local coordinate system and the global coordinate system.
  • the two coordinate systems can be synchronized.
  • the information originally represented by the local coordinate system For example, local pose, image feature points, 3D map points of SLAM map, etc.
  • the global coordinate system transformation matrix can be transformed to the global coordinate system based on the coordinate system transformation matrix.
  • the electronic device can transform the SLAM map in the local coordinate system to the global coordinate system according to the coordinate system transformation information, that is, obtain the SLAM map in the global coordinate system.
  • the electronic device performs pose calculation according to the video image collected by the camera and the global sub-map to obtain the global pose of the electronic device.
  • the 3D map points in the global submap can be unified with the coordinate system of the SLAM system.
  • the 3D map points in the global sub-map can be input as the measurement value of the SLAM system to achieve tight coupling between the global sub-map and the SLAM system, and then the global pose of the electronic device can be tracked in real time through pose estimation .
  • the pose in the local coordinate system (local pose) and the 3D map point of the SLAM map can be transformed to the global coordinate system, thus realizing the pose and 3D in the SLAM system
  • the map point and the 3D map point in the global submap are represented in the same coordinate system.
  • the 3D map points in the global sub-map can be used as the measurement value of the SLAM system, which will effectively eliminate the drift phenomenon of the pose tracking of the SLAM system.
  • the traditional visual measurement value (the 3D map point of the SLAM map) is calculated by the triangulation algorithm of the SLAM system in the local coordinate system, and the triangulation algorithm calculates the accuracy of the 3D map points generated
  • the degree depends on the accuracy of the pose estimation. Due to the long-term drift of the pose estimation, the 3D map points of the SLAM map calculated by the SLAM system have large errors. Conversely, when these 3D map points are used as measurement values, it will cause a large error in the pose estimation, as shown in formula (3):
  • i is the image frame index
  • j is the feature index observed in a certain image frame
  • L represents the description in the local coordinate system of the SLAM system
  • p represents the 3D feature point
  • P represents the pose of the electronic device
  • z represents the observation value of the 2D feature on the image
  • L p j represents the coordinate value of the 3D map point calculated by the triangulation algorithm in the SLAM system, which is used as the measured value in the SLAM algorithm.
  • G represents the description in the global coordinate system
  • G p j represents the coordinate value of the 3D map point corresponding to the jth feature observed in the i-th image frame, this coordinate value From the global sub-map; after the coordinates of the SLAM map points are converted to the global coordinate system, G p j can also come from the SLAM map in the global coordinate system.
  • G p j is a 3D map point in the global sub-map, which is used as a measurement value in the SLAM algorithm.
  • the electronic device displays the virtual object on the display component based on the global pose of the electronic device.
  • the electronic device updates the SLAM map in the global coordinate system based on the global pose of the electronic device.
  • the electronic device can display and update the virtual object on the display component in real time based on the global pose of the electronic device, and it can also transfer the electronic
  • the global pose of the device is fed back to the SLAM map in the global coordinate system, and the current image frame (key frame) is merged into the SLAM map in the global coordinate system based on the global pose, so as to realize the extension/extension of the SLAM map, and after the update
  • the SLAM map is more accurate than the traditional SLAM map.
  • Coordinate system transformation information (such as coordinate system transformation matrix).
  • the two coordinate systems can be synchronized.
  • the information originally represented by the local coordinate system (such as local pose, image feature points, The 3D map points of the SLAM map, etc.) can be transformed to the global coordinate system based on the coordinate system transformation matrix.
  • the pose and 3D map points in the SLAM system are expressed in the same coordinate system as the 3D map points in the global submap.
  • the 3D map points in the global sub-map can be used as the measurement value input of the SLAM system to realize the tight coupling between the global sub-map and the SLAM system, and then track the global pose of the electronic device in real time through pose estimation, which will effectively eliminate The drift of SLAM pose tracking.
  • the global pose of the electronic device can be used as the pose data in the SLAM map in the global coordinate system to update the SLAM map in the second coordinate system, which improves the accuracy of the SLAM map .
  • the embodiments of the present application make full use of the computing power of the electronic device to calculate the first pose data, the second pose data, and coordinate system transformation information, which improves processing efficiency, reduces processing delays, and can also reduce server calculations. burden.
  • FIG. 14 shows the components that may be further included in the functional modules in the electronic device 10 shown in FIG. 5 and the components that may be further included in the functional modules in the server 20 in another specific implementation.
  • the main difference between the embodiment in FIG. 14 and the embodiment in FIG. 6 is that in the functional module architecture shown in the embodiment in FIG. 14, the functional configuration of the global positioning module 16 is implemented on the side of the server 20, that is, the server 20 also includes an image retrieval module 161, The feature extraction module 162, the feature matching module 163, and the pose estimation module 164.
  • the global pose module 16 in the server 20 is used to obtain at least one frame of video image uploaded by the electronic device at the initial moment or any time thereafter, and calculate the global pose of the electronic device 10 in the global submap based on the video image (Ie, the second pose data), and send the second pose data to the electronic device 10.
  • the image retrieval module 161 may be used to obtain the K-th frame image in the video image sequence uploaded by the electronic device 10
  • the feature extraction module 162 may perform feature extraction according to the K-th frame image to obtain the image characteristics
  • the feature matching module 163 Perform feature matching on the image feature in the global sub-map to obtain a map feature that matches the image feature
  • the pose estimation module 164 calculates the status of the electronic device 10 according to the image feature and the map feature.
  • the second pose data in the global sub-map, and the second pose data is sent to the electronic device 10.
  • each functional module in the electronic device 10 shown in FIG. 14 may be similar to the related description of the electronic device 10 in the embodiment of FIG. 6. For the sake of brevity of the description, the details are not repeated here.
  • the functional modules in the electronic device 10 can cooperate with each other to perform the functions on the electronic device side in the embodiment shown in FIG. 15.
  • the functional modules in the server 20 can cooperate with each other to perform the server-side functions in the embodiment shown in FIG. 15.
  • FIG. 15 is a schematic flowchart of another virtual object display method provided by an embodiment of the present application, which is described separately from the electronic device and the server side.
  • the method includes but is not limited to the following steps:
  • the electronic device sends the first location fingerprint information and at least one frame of video image used to indicate the initial location of the electronic device to the server.
  • the electronic device in order to realize the first global positioning of the electronic device, the electronic device needs to upload location fingerprint information and at least one or more currently collected video images to the server.
  • the at least one frame of video image may be the K-th frame image in the video image sequence shot by the electronic device through the camera.
  • it can be the first frame of a video image sequence taken by a camera.
  • the initial location indicated by the location fingerprint information may be the geographic location information of the electronic device when the electronic device requests to download the map.
  • the source of the location fingerprint information includes the initial location measured by GNSS/WiFi/Bluetooth/base station. Location information, signal strength information, signal feature information, etc.; it may also be the location information entered by the user.
  • the electronic device may package and send the first location fingerprint information and a frame of video image to the server.
  • the electronic device may also independently send the first location fingerprint information and a frame of video image to the server.
  • the server obtains a global submap matching the fingerprint information of the first location from the global map.
  • the global sub-map is a sub-map corresponding to the location of the electronic device in the global map.
  • the server matches the location fingerprint information of the submap stored in the database in advance by the server according to the above-mentioned first location fingerprint information, and the submaps in the database are submaps belonging to the global map. If there is a matching sub-map, the sub-map is a global sub-map that needs to be subsequently issued to the electronic device.
  • the server traverses the global map saved by the server according to the above-mentioned first location fingerprint information until it finds an area matching the location fingerprint information, and the server takes the area from the global map as a global submap.
  • the server performs pose calculation according to the video image and the global sub-map, and obtains the global pose of the electronic device in the global sub-map (herein may also be referred to as second pose data).
  • the first calculation of the global pose of the electronic device can be done on the server side.
  • the process of performing the global pose calculation by the server also includes image retrieval, feature extraction, feature matching, pose estimation, etc. ,
  • the server performs feature detection on the video image, and extracts the image location of the feature from the video image.
  • the feature detection algorithm is not limited to feature detection methods such as FAST, ORB, SIFT, SURF, D2Net, SuperPoint, etc. Then perform feature description for each detected feature.
  • the feature description algorithm is not limited to feature description methods such as ORB, SIFT, SURF, BRIEF, BRISK, FREAK, D2Net, SuperPoint, etc., to form a one-dimensional vector for subsequent feature matching .
  • the server can match the most similar map content (such as one or more key frames) to the video image from the global submap.
  • Feature matching specifically calculates the degree of similarity between two feature descriptions. For Float-type vectors, it can be calculated by Euclidean distance; for binary vectors, it can be calculated by XOR. After finding the map content that is most similar to the video image, the pose estimation can be performed based on the video image and the most similar map content. For example, registration algorithms such as PnP, EPnP, 3D-3D, etc. can be used to calculate the first image. Two pose data.
  • the server issues the global pose (second pose data) of the electronic device in the global submap to the electronic device.
  • the electronic device receives the second pose data.
  • the electronic device can subsequently use the second pose data to calculate coordinate system transformation information (for example, a coordinate system transformation matrix).
  • S505 The server delivers the global sub-map to the electronic device. Correspondingly, the electronic device receives the global sub-map.
  • the electronic device stores the global sub-map in the SLAM system of the electronic device.
  • the electronic device calculates coordinate system transformation information, and transforms the SLAM map based on the coordinate system transformation information.
  • the process of the electronic device calculating coordinate system transformation information can be described as follows: the electronic device acquires the K-th frame image in the video image sequence collected by the camera, and the K-th frame image is the same as the video image sent to the server in S501 Image; then, on the one hand, according to the K-th frame image and the SLAM map, determine the local pose of the electronic device in the SLAM map under the local coordinate system originally constructed (herein may be referred to as the first pose data).
  • the specific implementation process may be similar to the description of the first pose data in S305 of the embodiment in FIG. 12. For the sake of brevity of the description, details are not repeated here.
  • the electronic device can further obtain the coordinates between the first coordinate system of the SLAM map and the second coordinate system of the global map according to the first pose data and the second pose data obtained through S504
  • the system transformation information, and the coordinate system transformation information may be, for example, a coordinate system transformation matrix.
  • a coordinate system transformation matrix For the related content of the coordinate system transformation matrix, please refer to the related description of the embodiment in FIG. 13. For the sake of brevity of the description, the details are not repeated here.
  • the electronic device performs pose calculation according to the video image collected by the camera and the global sub-map to obtain the global pose of the electronic device.
  • the electronic device displays the virtual object on the display component based on the global pose of the electronic device.
  • the electronic device updates the SLAM map in the global coordinate system based on the global pose of the electronic device.
  • S508-S510 may be similar to the related description of S306-S308 in the embodiment of FIG. 12. For the sake of brevity of the description, the details are not repeated here.
  • the electronic device needs to download the global sub-map of the corresponding area first, and the process of downloading the map takes a certain amount of time.
  • the first global pose estimation can be done on the server side. In other words, after the application is started, the first global pose estimation is performed on the server side. While starting the global pose estimation, the server obtains the global sub-map and transmits it to the electronic device accordingly, which improves the user's access speed to the application. The user does not perceive the delay of the map downloading process. In this way, the user waiting caused by the download delay can be avoided, and the user experience is further improved.
  • the computer may be implemented in whole or in part by software, hardware, firmware or any combination.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions, and when the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a network site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, and may also be a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid-state hard disk), and so on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种虚拟物体显示方法,应用于具有显示组件(323)和摄像头(324)的电子设备(10),该方法包括:检测到用户打开应用的操作;响应于该操作下载全局子地图并存储到电子设备(10)的即时定位与地图构建SLAM***中;全局子地图(123)是全局地图中与电子设备(10)的位置对应的子地图;在显示组件(323)上显示虚拟物体的位置与姿态,虚拟物体的位置与姿态是至少根据摄像头(324)采集的视频图像和全局子地图(123)执行位姿计算得到的。能够解决电子设备(10)的位姿漂移的问题,提升用户使用体验。

Description

虚拟物体显示方法以及电子设备
本申请要求于2019年11月08日提交到中国专利局、申请号为201911092326.0、申请名称为“虚拟物体显示方法以及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及虚拟场景技术领域,尤其涉及虚拟物体显示方法以及电子设备。
背景技术
虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)和混合现实(Mixed Reality,MR)技术是近年来新兴的多媒体虚拟场景技术。其中,VR技术是一种可以创建和体验虚拟世界的仿真技术,AR技术是一种可以将虚拟现实和真实世界叠加并进行互动的技术。MR技术是通过合并真实世界和虚拟世界而产生的新的可视化环境,并且在真实世界、虚拟世界和用户之间搭起一个交互反馈的信息回路的综合技术。
在上述虚拟场景技术中,通常采用同时定位与建图(Simultaneous Localization and Mapping,SLAM)技术进行电子设备自身在环境中的定位。SLAM技术具体可实现电子设备(例如手机、VR眼镜等移动电子设备)在环境中从一个未知位置开始移动时,在移动过程中根据位置估计和地图进行自身定位,同时在自身定位的基础上建造增量式地图,以便于后续的定位,采用SLAM技术的***或模块又可称为空间定位引擎。
在一些城市级交互式VR或AR应用中,服务器端的全局坐标系与电子设备的SLAM***的局部坐标系对齐之后,全局坐标系中的VR或AR应用的虚拟物体就可以在电子设备显示。然而,当SLAM***长时间运行后,电子设备在局部坐标系下的位姿会出现漂移,导致虚拟物体在电子设备的显示位置和方向也会出现偏移,给用户带来使用上的不便。
发明内容
本申请实施例提供了虚拟物体显示方法以及电子设备,能够一定程度上解决电子设备的位姿漂移的问题,提升用户使用体验。
第一方面,本申请实施例提供了一种虚拟物体显示方法,该方法可应用于具有显示组件(例如显示屏(例如,触摸屏、柔性屏、曲面屏等,或者光学组件)和摄像头的电子设备,电子设备可以是手持终端(如手机),VR或AR眼镜,无人机,无人车等。方法包括:检测到用户打开应用的操作;响应于所述操作,下载全局子地图并存储到所述电子设备的即时定位与地图构建(SLAM)***中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;在所述显示组件上显示所述虚拟物体的位置与姿态,所述虚拟物体的位置与姿态是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算得到的。
需要说明的是,计算位姿例如可以采用BA(Bundle Adjustment)的方法。所述虚拟物体的姿态,例如可以指的是所述虚拟物体的朝向。
其中,所谓“打开应用的操作”可以是通过点击、触摸、滑动、或抖动等方式打开应用,也可以是声控或者其它途径打开应用,本申请对此不作限定。举例来说,电子设备检测到用户的触摸操作后,应用中的导航功能被启动,摄像头被启动等等。
本申请又一种可能的实现中,电子设备也可以通过其他方式来启动下载全局子地图的步骤,例如,通过检测环境光线的变化来启动下载全局子地图的步骤。
其中,服务器可作为向电子设备的VR或AR或MR应用提供内容和信息支撑的平台。在服务器中存储有全局地图,通常来讲,全局地图是一种大地理范围的高精度的地图,所谓“大地理范围”是相对于电子设备中的SLAM地图所代表的地理范围而言的,例如全局地图可以是由一个或多个电子设备生成的多个SLAM地图按照一定的规则进行整合得到的。相应的,全局子地图表示全局地图中与电子设备的位置对应的子地图,也就是说,可以以电子设备的实际位置在去全局地图中的位置点为起点,获取起点周围预设区域内的地图内容作为所述全局子地图。
本实施例中,电子设备可安装有VR或AR或MR应用等虚拟场景应用,并可基于用户的操作(例如点击、触摸、滑动、抖动、声控等)运行该VR或AR或MR应用。电子设备可通过本地摄像头采集环境中的视频图像,结合采集的视频图像以及所下载的全局子地图确定电子设备的当前位姿,进而基于电子设备的当前位姿在显示组件上显示虚拟物体的位置与姿态。虚拟物体相应可以为VR场景、AR场景或MR场景中的虚拟物体(即虚拟环境中的物体)。
现有方案中,SLAM***在电子设备不断移动的过程中,不断在创建SLAM地图,并基于SLAM地图估计的位姿和由自身传感器采集的数据所估计的位姿来共同生成电子设备的最终位姿。而SLAM地图建立过程中会不断引入噪声,即基于SLAM地图估计的位姿会累积误差,传感器采集的数据同样会存在噪声,由自身传感器采集的数据所估计的位姿也会累积误差,导致出现位姿漂移现象。
而本申请实施例中,电子设备检测到用户打开应用的操作后,通过请求服务器下载全局子地图,利用比SLAM地图精度更高的全局子地图作为视觉观测输入给SLAM***,SLAM***使用全局子地图和采集的视频图像来估计电子设备的位姿,能够有效减少甚至消除SLAM***长时间位姿估计出现的位姿漂移现象,进而确保在VR或AR或MR应用的长时间(例如超过1分钟)运行中虚拟物体在电子设备的显示位置和方向不会偏移,长时间正确地显示虚拟物体(例如,在视频图像所表示的环境和时长下,以相对准确的方式显示虚拟物体),从而提升用户的使用体验。
本文中,电子设备在全局子地图中的位姿又可称为全局位姿,相应的,电子设备在自身所构建的SLAM地图中的位姿又可称为局部位姿。
基于第一方面,在一种可能的实施方式中,所述电子设备的位姿数据用于表征所述虚拟物体的位置与姿态,所述电子设备的位姿数据是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图以第一频率执行位姿计算得到的。
其中,第一频率表示电子设备中的SLAM***执行全局位姿估计的频率,即调用全局子地图的频率。例如,第一频率的取值范围可以是10~30Hz,即每秒10~30帧,即调用全局子地图的频率可以是10~30帧中的某个值。第一频率在数值上可以小于等于显示组件(例 如显示面板)显示视频流的频率。
也就是说,电子设备的SLAM***可以以较高的固定频率调用所述全局子地图,以实现对电子设备的位姿跟踪或位姿更新。这样,在实时获得电子设备的全局位姿(即位姿数据)后,可以根据电子设备的全局位姿在显示组件内实时显示和更新AR场景中虚拟物体的位置与姿态。并且在长时间的位姿更新过程中,在虚拟物体的位置与姿态也不会发生跳动现象。这是因为,一方面,在这段时间内,电子设备中的SLAM***是访问存储的全局子地图来计算电子设备的全局位姿,能够克服电子设备中预先构建的SLAM地图不准确的问题,尽量避免位姿误差的累计,从而最大程度地克服漂移现象的发生;另一方面电子设备能够稳定地、高频次地访问存储的全局子地图来执行全局位姿估计,极大减少了全局位姿的突变;再一方面,计算全局位姿的过程在电子设备侧完成,位姿估计的算法延迟低,位姿跟踪效果好。所以,本实施例能长时间显示虚拟物体且虚拟物体在画面中不会偏移出错,消除了由于位姿突变导致的虚拟物体的跳动现象,进一步提升用户体验。
本申请实施例中,SLAM地图例如可以包含以下地图内容:多个关键帧、三角测量的特征点以及关键帧和特征点之间的关联。关键帧可以基于摄像头采集的图像和用来产生图像的相机参数(例如,电子设备在SLAM坐标系中的位姿)而形成。其中所述特征点可以表示SLAM地图中沿着三维空间的不同的3D地图点以及在3D地图点上的特征描述。每一特征点可具有关联的特征位置。每个特征点可表示3D坐标位置,并且与一或多个描述符相关联。特征点又可称为3D特征、特征点、3D特征点或者其他合适的名字。
其中,3D地图点(或称三维地图点)表示在三维空间轴线X、Y及Z上的坐标,例如,对于局部坐标系下的SLAM地图,3D地图点表示在局部坐标系三维空间轴线X、Y及Z上的坐标。对于全局坐标系下的SLAM地图,3D地图点表示在全局坐标系三维空间轴线X、Y及Z上的坐标。
基于第一方面,在一种可能的实施方式中,所述位姿计算的过程包括:根据所述摄像头采集的视频图像、所述全局子地图以及所述电子设备采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
本实施例中,所述电子设备采集的运动数据例如可以是电子设备中的惯性测量单元(Inertial Measurement Unit,IMU)采集的运动数据,IMU可以高频率地采集电子设备的角速度、线加速度等信息,角加速度、线加速度等进行积分来估计电子设备的位姿。电子设备可以基于采集的视频图像和IMU高频率,采用SLAM算法高频率的调用全局子地图的3D特征(3D地图点),通过引入IMU来进一步提高所估计的全局位姿的准确度,保证了全局子地图的3D特征(3D地图点)有效作为量测值作用到SLAM算法中,通过高精度地执行位姿估计来避免位姿漂移现象和跳动现象的发生。
基于第一方面,在一种可能的实施方式中,所述响应于所述操作,下载全局子地图,包括:响应于所述操作,向服务器发送所述电子设备的初始位置的指示信息;从所述服务器接收所述全局子地图,所述全局子地图是根据所述电子设备的初始位置确定的。
本申请实施例通过上传电子设备的初始位置的指示信息的方式来请求下载全局子地图,而不需要上传视频图像,这样就能够实现获得与电子设备的初始位置相关的全局子地图,也有利于节省带宽资源,减少服务器的处理负担,还有利于减少或消除了隐私泄露的 风险。
基于第一方面,在一种可能的实施方式中,所述电子设备的初始位置的指示信息包括用于指示所述电子设备的初始位置的第一位置指纹信息;所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
其中,所述位置指纹信息所指示的初始位置可能是所述电子设备请求下载地图时电子设备所处的地理位置信息,例如位置指纹信息来源包括GNSS/WiFi/蓝牙/基站等定位方式所测得的初始位置信息、信号强度信息、信号特征信息等等;也可能是在由用户输入的位置信息。
本申请实施例通过在服务器执行上传的位置指纹信息与全局子地图的位置指纹信息匹配,能够将有用的全局子地图下载到电子设备侧,提高了匹配效率和精准度,进而有利于减少下载地图的时延。
基于第一方面,在一种可能的实施方式中,所述方法还包括:电子设备根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图。
具体的,可以先将SLAM地图转换到全局子地图对应的坐标系(即全局坐标系)下,这样,由于电子设备的位姿、SLAM地图都已经处于全局坐标系下,所以电子设备可以将电子设备的全局位姿反馈到全局坐标系下的SLAM地图,基于全局位姿将当前的图像帧(关键帧)融合进全局坐标系下的SLAM地图,从而实现SLAM地图的扩展/延伸,且更新后的SLAM地图相比于传统的SLAM地图更加准确。
基于第一方面,在一种可能的实施方式中,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图之前,进一步包括:根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;相应的,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图,包括:以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
其中,所述第K帧图像指的是摄像头采集的视频图像序列中的某一帧。应当理解的是,也就是说,所述摄像头采集的视频图像可以是一个视频序列(视频流),可以包括多帧图像,而第K帧图像可以是视频流中的某一帧。
本文中,构建SLAM地图所采用的坐标系可称为第一坐标系,本文中的第一坐标系在某些应用场景下也可能被称为局部坐标系、SLAM坐标系、相机坐标系或其他某个合适的术语。相应的,电子设备在局部坐标系下所体现出的位姿可以称为局部位姿。
构建该全局地图所采用的坐标系可称为第二坐标系,本文中的第二坐标系在某些应用场景下也可能被称为全局坐标系、世界坐标系或其他某个合适的术语。相应的,电子设备在全局坐标系下所体现出的位姿可以称为全局位姿。
所述电子设备的位姿数据为第一坐标系下的所述电子设备的位姿数据,或者,第二坐 标系下的所述电子设备的位姿数据;所述第一坐标系下是所述SLAM***的SLAM地图的坐标系,所述第二坐标系是所述全局子地图的坐标系。
本申请实施例通过根据同一帧分别获得终端在局部坐标系中的位姿和全局坐标系中的位姿,基于这两种位姿就能够获得两种坐标系之间的坐标系变换信息(例如坐标系变换矩阵),根据这个坐标系变换矩阵就可以实现了两个坐标系同步,这样,原先用局部坐标系来表示的信息(例如局部位姿、图像的特征点、SLAM地图的3D地图点等等)就可以基于坐标系变换矩阵变换到全局坐标系下。这样就实现了SLAM***中的位姿、3D地图点与全局子地图中的3D地图点在同一个坐标系下表示。进而,全局子地图中3D地图点就可以作为SLAM***的量测值输入,实现全局子地图和SLAM***的紧耦合,进而通过位姿估计实时跟踪电子设备的全局位姿,这将能够有效消除SLAM位姿跟踪的漂移。后续需要更新SLAM地图时,就可以将电子设备的全局位姿作为在全局坐标系下的SLAM地图中的位姿数据来更新所述第二坐标系下的SLAM地图。
基于第一方面,在一种可能的实施方式中,所述根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据,包括:根据所述第K帧图像、所述第一坐标系下的SLAM地图以及所述电子设备采集的运动数据,获得所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;所述运动数据包括运动速度数据和运动方向数据。
例如,电子设备中设置有IMU,SLAM***的输入信号包括来自摄像头采集的视频图像、IMU采集的运动数据和局部坐标系下的SLAM地图。IMU高频率地检测电子设备的角速度和线加速度,并对角加速度和线加速度分别积分,进而可计算出电子设备的位姿。摄像头采集的视频图像通过在局部坐标系下的SLAM地图中做匹配,从而也能计算获得电子设备的位姿。那么,基于这两种位姿以一定算法进行运算,就可以获得该第一位姿数据。
又例如,电子设备中除了摄像头和IMU外,还设置有与位姿或运动相关的定位模块(GPS定位、北斗定位、WIFI定位、或基站定位等),那么,SLAM***还可共同参考摄像头采集的视频图像、IMU采集的运动数据、局部坐标系下的SLAM地图以及定位模块所采集的数据来计算获得该第一位姿数据,进一步提高第一位姿数据的准确性。
基于第一方面,在一种可能的实施方式中,所述根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据,包括:
根据所述第K帧图像进行特征提取,获得图像特征;将所述图像特征在所述第二坐标系下的全局子地图中进行特征匹配,获得与所述图像特征匹配的地图特征;根据所述图像特征和所述地图特征,计算获得所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据。
例如,电子设备对第K帧图像进行特征检测,在第K帧图像中提取出特征的图像位置,特征检测算法不限于FAST、ORB、SIFT、SURF、D2Net、SuperPoint等特征检测方法。然后对每一个检测出的特征进行特征描述,特征描述算法不限于ORB、SIFT、SURF、BRIEF、BRISK、FREAK、D2Net、SuperPoint等特征描述方法,从而形成一个一维向量,用于后续的特征匹配。通过特征匹配,电子设备可从全局子地图中匹配出与第K帧图像最相似的地图内容(例 如一帧或多帧关键帧),具体方法例如包括基于BOW、VLAD等传统图像检索方法以及基于NetVLAD、AI的新型图像检索方法。找出与第K帧图像最相似的地图内容后,就可以基于第K帧图像和最相似的地图内容进行位姿估计,例如可采用PnP、EPnP、3D-3D等配准算法,从而可以计算出该第二位姿数据。
本申请实施例可以实现在电子设备侧,充分利用电子设备的运算能力,计算第一位姿数据和第二位姿数据,提高了处理效率,还能够减轻服务器的计算负担。
基于第一方面,在一种可能的实施方式中,所述根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据,包括:
向服务器发送所述第K帧图像;从所述服务器接收所述第二位姿数据,所述第二位姿数据是所述服务器根据所述第K帧图像和所述第二坐标系下的全局子地图进行特征提取和特征匹配确定的。
其中,所述第K帧图像可以是摄像头拍摄的视频图像序列中的第1帧图像。
本实施例中,由于电子设备需要先下载对应区域的全局子地图。下载地图的过程需要花费一定时间。为了加快用户进入应用的速度,可以采用第一次全局位姿估计在服务器端完成的方式。也即是说,应用启动后,第一次全局位姿估计在服务器侧进行,在启动全局位姿估计的同时,服务器相应获取全局子地图并向电子设备传输,提高用户进入应用的速度。用户并不会感知到地图下载流程的时延存在,这样,能够避免了下载时延造成的用户等待,进一步提升用户体验。
基于第一方面,在一种可能的实施方式中,所述在所述显示组件上显示所述虚拟物体的位置与姿态,包括:在所述显示组件上显示第一界面,在所述第一界面显示视频流和虚拟物体;所述虚拟物体相对于所述视频流的位置与姿态是基于所述电子设备的位姿数据来显示的,所述电子设备的位姿数据是至少根据所述摄像头采集的所述视频图像和所述全局子地图执行位姿计算过程得到的。
其中,所述虚拟物体相对于所述视频流的位置与姿态例如所述虚拟物体叠加于所述视频流的位置与姿态。所述虚拟物体叠加于所述视频流的位置与姿态是基于所述电子设备的位姿数据来显示的,所述电子设备的位姿数据是至少根据所述摄像头采集的所述视频流和所述全局子地图执行位姿计算处理得到的。
例如,AR应用可借助计算机图形技术和可视化技术生成现实环境中不存在的虚拟物体,并基于电子设备当前的全局位姿,从而在取景框内把虚拟物体叠加到取景框的视频流中。即所述虚拟物体叠加于所述视频流的位置与姿态是基于所述电子设备的位姿数据来显示的。所述电子设备的位姿数据是至少根据所述摄像头采集的所述视频流和所述全局子地图执行位姿计算处理得到的。
需要说明的是,本申请的方案也可应用于VR场景(例如应用于VR眼镜),在VR场景中显示屏所显示的内容可以只有虚拟物体而没有真实环境的视频流。
第二方面,本申请实施例提供又一种虚拟物体显示方法,该方法可应用于具有显示组件和摄像头的电子设备,电子设备可以是手持终端(如手机),VR或AR眼镜,无人机,无人车等,所述方法包括:获取全局子地图并存储到所述电子设备的即时定位与地图构建 (SLAM)***中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据;基于所述电子设备的位姿数据,在所述显示组件上显示所述虚拟物体(或者,在所述显示组件上显示所述虚拟物体的位置与姿态)。
其中,所述电子设备的位姿数据可以是第一坐标系(用于SLAM***生成的SLAM地图的局部坐标系)下的位姿数据,或者,第二坐标系(全局子地图对应的全局坐标系)下的位姿数据,
本实施例中,电子设备可从服务器下载全局子地图,通过本地摄像头采集环境中的视频图像,结合采集的视频图像以及所下载的全局子地图确定电子设备的当前位姿,进而基于电子设备的当前位姿在显示组件上显示虚拟物体的位置与姿态。虚拟物体相应可以为VR场景、AR场景或MR场景中的虚拟物体(即虚拟环境中的物体)。电子设备的显示组件具体可以包括显示面板,也可以包括镜片(例如VR眼镜)或者投影屏等。
通常来讲,全局地图是一种大地理范围的高精度的地图,所谓“大地理范围”是相对于电子设备中的SLAM地图所代表的地理范围而言的,例如全局地图可以是由一个或多个电子设备生成的多个SLAM地图按照一定的规则进行整合得到的。相应的,全局子地图表示全局地图中与电子设备的位置对应的子地图,也就是说,可以以电子设备的实际位置在去全局地图中的位置点为起点,获取起点周围预设区域内的地图内容作为所述全局子地图。
本申请实施例中,电子设备通过请求服务器下载全局子地图,利用比SLAM地图精度更高的全局子地图作为视觉观测输入给SLAM***,SLAM***使用全局子地图来估计电子设备的位姿,能够有效减少甚至消除SLAM***长时间位姿估计出现的位姿漂移现象,进而确保在VR或AR或MR应用的长时间(例如超过1分钟)运行中虚拟物体在电子设备的显示位置和方向不会偏移,长时间正确地显示虚拟物体(例如,在视频图像所表示的环境和时长下,以相对准确的方式显示虚拟物体),从而提升用户的使用体验。
基于第二方面,在一种可能的实施方式中,所述根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据,包括:至少根据所述摄像头采集的视频图像和所述全局子地图以第一频率执行位姿计算,以得到所述电子设备的位姿数据。
其中,第一频率表示电子设备中的SLAM***执行全局位姿估计的频率,即调用全局子地图的频率。第一频率在数值上可以小于等于显示面板显示视频流的频率。例如,第一频率的取值范围可以是10~30Hz,即每秒10~30帧,即调用全局子地图的频率可以是10~30帧中的某个值。
也就是说,电子设备的SLAM***可以以较高的固定频率调用所述全局子地图,以实现对电子设备的位姿跟踪。这样,在实时获得电子设备的全局位姿后,可以根据电子设备的全局位姿在显示组件内实时显示和更新AR场景中虚拟物体的位置与姿态。并且在长时间的位姿更新过程中,在虚拟物体的位置与姿态也不会发生跳动现象。这是因为,一方面,在这段时间内,电子设备中的SLAM***是基于全局子地图来计算电子设备的全局位姿,能够克服电子设备中预先构建的SLAM地图不准确的问题,尽量避免位姿误差的累计,从而最大程度地克服漂移现象的发生;另一方面电子设备能够稳定地、高频次地基于全局子地图 来执行全局位姿估计,极大减少了全局位姿的突变;再一方面,计算全局位姿的过程在电子设备侧完成,位姿估计的算法延迟低,位姿跟踪效果好。所以,本实施例能长时间显示虚拟物体且虚拟物体在画面中不会偏移出错,消除了由于位姿突变导致的虚拟物体的跳动现象,进一步提升用户体验。
基于第二方面,在一种可能的实施方式中,所述电子设备的位姿数据为第一坐标系下的所述电子设备的位姿数据,或者,第二坐标系下的所述电子设备的位姿数据;所述第一坐标系下是所述SLAM***的SLAM地图的坐标系,所述第二坐标系是所述全局子地图的坐标系。
基于第二方面,在一种可能的实施方式中,所述根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据,包括:根据所述摄像头采集的视频图像、所述全局子地图以及所述电子设备采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。所述电子设备采集的运动数据例如可以是电子设备中的惯性测量单元(IMU)采集的运动数据,通过引入IMU来进一步提高所估计的全局位姿的准确度,保证了全局子地图的3D特征有效作为量测值作用到SLAM算法中,通过高精度地执行位姿估计来避免位姿漂移现象和跳动现象的发生。
基于第二方面,在一种可能的实施方式中,所述方法还包括:电子设备根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图。
基于第二方面,在一种可能的实施方式中,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图之前,进一步包括:根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;相应的,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图,包括:以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
本申请实施例通过根据同一帧分别获得终端在局部坐标系中的位姿和全局坐标系中的位姿,基于这两种位姿就能够获得两种坐标系之间的坐标系变换信息(例如坐标系变换矩阵),根据这个坐标系变换矩阵就可以实现了两个坐标系同步,这样,原先用局部坐标系来表示的信息(例如局部位姿、图像的特征点、SLAM地图的3D地图点等等)就可以基于坐标系变换矩阵变换到全局坐标系下。这样就实现了SLAM***中的位姿、3D地图点与全局子地图中的3D地图点在同一个坐标系下表示。进而,全局子地图中3D地图点就可以作为SLAM***的量测值输入,实现全局子地图和SLAM***的紧耦合,进而通过位姿估计实时跟踪电子设备的全局位姿,这将能够有效消除SLAM位姿跟踪的漂移。后续需要更新SLAM地图时,就可以将电子设备的全局位姿作为在全局坐标系下的SLAM地图中的位姿数据来更新所述第二坐标系下的SLAM地图。
基于第二方面,在一种可能的实施方式中,所述获取全局地图的全局子地图,包括:向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息;从所述服务器接收所述全局子地图,所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。通过服务器执行地图匹配操作,提高了匹配效率和精准度,进而有利于减少下载地图的时延。
基于第二方面,在一种可能的实施方式中,所述虚拟物体为虚拟现实VR场景、增强现实AR场景、或混合现实MR场景中的虚拟物体。
第三方面,本申请实施例提供了一种用于虚拟物体显示的电子设备,包括:交互模块、数据采集模块、通信模块和SLAM模块,其中:
交互模块,用于检测到用户打开应用的操作;
通信模块,用于响应于所述操作,下载全局子地图并存储到所述电子设备的即时定位与地图构建(SLAM)模块中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
所述交互模块还用于在所述显示组件上显示所述虚拟物体的位置与姿态,所述虚拟物体的位置与姿态是SLAM模块至少根据数据采集模块采集的视频图像和所述全局子地图执行位姿计算得到的。
其中,所述SLAM模块可以是本申请的实施例中所描述的SLAM***,例如可以是本文后文实施例描述的SLAM***12。
基于第三方面,在一种可能的实施方式中,所述电子设备的位姿数据用于表征所述虚拟物体的位置与姿态,所述电子设备的位姿数据是所述SLAM模块至少根据所述数据采集模块采集的视频图像和所述全局子地图以第一频率执行位姿计算得到的。
基于第三方面,在一种可能的实施方式中,所述SLAM模块执行位姿计算的过程包括:根据所述数据采集模块采集的视频图像、所述全局子地图以及所述数据采集模块采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
基于第三方面,在一种可能的实施方式中,所述通信模块具体用于:响应于所述操作,向服务器发送所述电子设备的初始位置的指示信息;从所述服务器接收所述全局子地图,所述全局子地图是根据所述电子设备的初始位置确定的。
基于第三方面,在一种可能的实施方式中,所述电子设备的初始位置的指示信息包括用于指示所述电子设备的初始位置的第一位置指纹信息;所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
基于第三方面,在一种可能的实施方式中,所述SLAM模块还用于,根据所述电子设备的位姿数据更新所述SLAM模块的SLAM地图。
基于第三方面,在一种可能的实施方式中,所述电子设备还包括全局定位模块和坐标系变化矩阵计算模块;
所述SLAM模块具体用于,根据所述数据采集模块采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
所述全局定位模块具体用于,根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;
所述坐标系变化矩阵计算模块具体用于,通过所述坐标系变化矩阵计算模块根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
所述SLAM模块还用于,根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
基于第三方面,在一种可能的实施方式中,所述SLAM模块具体用于:根据所述第K帧图像、所述第一坐标系下的SLAM地图以及所述数据采集模块采集的运动数据,获得所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;所述运动数据包括运动速度数据和运动方向数据。
基于第三方面,在一种可能的实施方式中,全局定位模块具体用于:所述根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据,包括:根据所述第K帧图像进行特征提取,获得图像特征;将所述图像特征在所述第二坐标系下的全局子地图中进行特征匹配,获得与所述图像特征匹配的地图特征;根据所述图像特征和所述地图特征,计算获得所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据。
基于第三方面,在一种可能的实施方式中,所述通信模块还用于:向服务器发送所述第K帧图像;从所述服务器接收所述第二位姿数据,所述第二位姿数据是所述服务器根据所述第K帧图像和所述第二坐标系下的全局子地图进行特征提取和特征匹配确定的。
基于第三方面,在一种可能的实施方式中,所述交互模块具体用于:在所述显示组件上显示第一界面,在所述第一界面显示视频流和虚拟物体;所述虚拟物体相对于所述视频流的位置与姿态是基于所述电子设备的位姿数据来显示的,所述电子设备的位姿数据是至少根据所述数据采集模块采集的所述视频图像和所述全局子地图执行位姿计算过程得到的。
第四方面,本申请实施例提供了一种用于虚拟物体显示的电子设备,包括交互模块、数据采集模块、通信模块和SLAM模块,其中:
通信模块,用于获取全局子地图并存储到所述电子设备的即时定位与地图构建(SLAM)模块中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
SLAM模块,用于根据所述数据采集模块采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据;
交互模块,用于基于所述电子设备的位姿数据,在所述显示组件上显示所述虚拟物体。
其中,所述SLAM模块可以是本申请的实施例中所描述的SLAM***,例如可以是本文后文实施例描述的SLAM***12。
基于第四方面,在一种可能的实施方式中,所述SLAM模块具体用于:至少根据所述数据采集模块采集的视频图像和所述全局子地图以第一频率执行位姿计算,以得到所述电子设备的位姿数据。
基于第四方面,在一种可能的实施方式中,所述电子设备的位姿数据为第一坐标系下 的所述电子设备的位姿数据,或者,第二坐标系下的所述电子设备的位姿数据;所述第一坐标系下是所述SLAM模块的SLAM地图的坐标系,所述第二坐标系是所述全局子地图的坐标系。
基于第四方面,在一种可能的实施方式中,所述SLAM模块具体用于:根据所述数据采集模块采集的视频图像、所述全局子地图以及所述数据采集模块采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
基于第四方面,在一种可能的实施方式中,所述SLAM模块还用于,根据所述电子设备的位姿数据更新所述SLAM模块的SLAM地图。
基于第四方面,在一种可能的实施方式中,所述电子设备还包括全局定位模块和坐标系变化矩阵计算模块;
所述SLAM模块具体用于,根据所述数据采集模块采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
所述全局定位模块具体用于,根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;
所述坐标系变化矩阵计算模块具体用于,通过所述坐标系变化矩阵计算模块根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
所述SLAM模块还用于,根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
基于第四方面,在一种可能的实施方式中,所述通信模块还用于,向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息;从所述服务器接收所述全局子地图,所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
基于第四方面,在一种可能的实施方式中,所述虚拟物体为虚拟现实VR场景、增强现实AR场景、或混合现实MR场景中的虚拟物体。
第五方面,本申请实施例提供了一种用于虚拟物体显示的电子设备,包括:显示组件、摄像头、一个或多个处理器、存储器以及一个或多个应用程序、一个或多个计算机程序;所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得电子设备执行上述第一方面或第一方面任一项可能的实施方式中描述的虚拟物体显示方法。
第六方面,本申请实施例提供了一种用于虚拟物体显示的电子设备,包括:显示组件、摄像头、一个或多个处理器、存储器,以及一个或多个计算机程序;所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得电子设备执行上述第二方面或第二方面任一项可能的实施方式中描述的虚拟物体显示方法。
第七方面,本申请实施例提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第一方面或第一方面的任一可能的实施方式中的虚拟物体显示方法。
可选地,作为一种实施方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第一方面的任一可能的实施方式中的虚拟物体显示方法。
第八方面,本申请实施例提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第二方面或第二方面的任一可能的实施方式中的虚拟物体显示方法。
可选地,作为一种实施方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第二方面或第二方面的任一可能的实施方式中的虚拟物体显示方法。
第九方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第一方面或者第一方面的任一可能的实施方式中的方法的指令,或者,所述程序代码包括用于执行第二方面或者第二方面的任一可能的实施方式中的方法的指令。
第十方面,本发明实施例提供了一种计算机程序产品,该计算机程序产品可以为一个软件安装包,该计算机程序产品包括程序指令,当该计算机程序产品被电子设备执行时,该电子设备的处理器执行前述第一方面或第二方面中的任一可能的实施方式中的方法。
可以看到,实施本申请实施例,电子设备通过请求服务器下载全局子地图,利用比SLAM地图精度更高的全局子地图作为视觉观测输入给SLAM***,电子设备的SLAM***可以以较高的固定频率调用所述全局子地图,以实现对电子设备的位姿跟踪。这样,在实时获得电子设备的全局位姿后,可以根据电子设备的全局位姿在显示组件内实时显示和更新AR场景中虚拟物体的位置与姿态。并且在长时间的位姿更新过程中,在虚拟物体的位置与姿态也不会发生漂移现象和跳动现象。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是本申请实施例提供的一种应用架构的示意图;
图2是本申请实施例提供的一种电子设备的结构示意图;
图3是本申请实施例提供的一种服务器的结构示意图;
图4是一种AR场景中出现位姿漂移情况与理想情况的比较图;
图5是本申请实施例提供的一种***,以及该***中的电子设备和服务器的结构示意图;
图6是本申请实施例提供的又一种***,以及该***中的电子设备和服务器的结构示意图;
图7是本申请实施例提供的一种虚拟物体显示方法的流程示意图;
图8是本申请实施例提供的一种采用本申请方法实现的场景的示意图;
图9是本申请实施例提供的又一种采用本申请方法实现的场景的示意图;
图10是本申请实施例提供的又一种采用本申请方法实现的场景的示意图;
图11是本申请实施例提供的又一种虚拟物体显示方法的流程示意图;
图12是本申请实施例提供的又一种虚拟物体显示方法的流程示意图;
图13是本申请实施例提供的一种坐标系变换矩阵相关的场景示意图;
图14是本申请实施例提供的又一种***,以及该***中的电子设备和服务器的结构示意图;
图15是本申请实施例提供的又一种虚拟物体显示方法的流程示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别不同对象,而不是用于限定特定顺序。在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例或所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。如本文所使用的,单数形式的“一”、“某”和“该”旨在也包括复数形式,除非上下文另有明确指示。还将理解,术语“包括”、“具有”、“包含”和/或“含有”在本文中使用时指定所陈述的特征、整数、步骤、操作、要素、和/或组件的存在,但并不排除一个或多个其他特征、整数、步骤、操作、要素、组件和/或其群组的存在或添加。
需要说明的是,在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。
首先描述本申请实施例所涉及的一种应用架构。
参见图1,本申请实施例提供的应用架构包括电子设备10和服务器20,电子设备10和服务器20之间可进行通信,例如电子设备10可通过比如无线保真(wireless-fidelity,Wifi)通信、蓝牙通信、或蜂窝2/3/4/5代(2/3/4/5generation,2G/3G/4G/5G)通信等方式与服务器20进行通信。
电子设备10可以为配置有摄像头和显示组件的各种类型的设备,例如电子设备10可以为手机、平板电脑、笔记本电脑、录像机等终端设备(如图1以电子设备为手机为例),也可以是用于虚拟场景交互的设备,如VR眼镜、AR设备,MR互动设备等,可以为智能手表、智能手环等可穿戴电子设备,还可以是无人驾驶车辆、无人机等载具中的车载设备。本申请实施例对电子设备的具体形式不做特殊限制。
此外,电子设备10也可能被称为用户装备(UE)、订户站、移动单元、订户单元、无线 单元、远程单元、移动设备、无线设备、无线通信设备、远程设备、移动订户站、终端设备、接入终端、移动终端、无线终端、智能终端、远程终端、手持机、用户代理、移动客户端、客户端、或其他某个合适的术语。
服务器20具体可以是一台或多台物理服务器(例如图1中示例性地示出了一台物理服务器),也可以是计算机集群,还可以是云计算场景的虚拟机,等等。
本申请实施例中,电子设备10可安装VR或AR或MR应用等虚拟场景应用程序,并可基于用户的操作(例如点击、触摸、滑动、抖动、声控等)运行该VR或AR或MR应用。电子设备可通过本地摄像头和/或传感器采集环境中任意物体的视频图像,根据采集的视频图像在显示组件上显示虚拟物体。虚拟物体相应可以为VR场景、AR场景或MR场景中的虚拟物体(即虚拟环境中的物体)。
需要说明的是,本申请实施例中,电子设备10中的虚拟场景应用可以是电子设备自身内置的应用程序,也可以是用户自行安装的第三方服务商提供的应用程序,本申请实施例不作限定。
本申请实施例中,电子设备10还配置有即时定位与地图构建(simultaneous localization and mapping,SLAM)***,SLAM***能够实现在完全未知环境中创建地图,并利用该地图进行自主定位、位姿(位置与姿态)确定、导航等。本文中由SLAM***所构建的地图可简称SLAM地图,SLAM地图可以理解为SLAM***根据采集设备采集到的环境信息所绘制出的地图,采集设备例如可以包括电子设备中的图像采集装置(例如摄像头或相机)和惯性测量单元(Inertial Measurement Unit,IMU),IMU中可以包括:陀螺仪、加速度计等传感器。
例如,SLAM地图可以包含以下地图内容:多个关键帧、三角测量的特征点以及关键帧和特征点之间的关联。关键帧可以基于摄像头采集的图像和用来产生图像的相机参数(例如,电子设备在SLAM坐标系中的位姿)而形成。其中所述特征点可以表示SLAM地图中沿着三维空间的不同的3D地图点以及在3D地图点上的特征描述。每一特征点可具有关联的特征位置。每个特征点可表示3D坐标位置,并且与一或多个描述符相关联。特征点又可称为3D特征、特征点、3D特征点或者其他合适的名字。
其中,3D地图点(或称三维地图点)表示在三维空间轴线X、Y及Z上的坐标,例如,对于局部坐标系下的SLAM地图,3D地图点表示在局部坐标系三维空间轴线X、Y及Z上的坐标。对于全局坐标系下的SLAM地图,3D地图点表示在全局坐标系三维空间轴线X、Y及Z上的坐标。
构建该SLAM地图所采用的坐标系可称为第一坐标系,本文中的第一坐标系在某些应用场景下也可能被称为局部坐标系、SLAM坐标系、相机坐标系或其他某个合适的术语。为了理解上的方便,后文将主要以“局部坐标系”的名称进行方案的介绍。相应的,电子设备在局部坐标系下所体现出的位姿可以称为局部位姿。
本申请实施例中,服务器20可作为向电子设备10的VR或AR或MR应用提供内容和信息支撑的平台。服务器20中也存储有地图,本文中该地图可简称为全局地图。通常的,全局地图相比单一电子设备中的SLAM地图而言,包含了更大的区域,而且地图内容精度更高,由服务器对全局地图进行维护和更新。在一种实现中,可预先在服务器中离线构建全局地 图。在又一种实现中,全局地图可以是由一个或多个电子设备采集到的多个SLAM地图按照一定的规则进行整合得到的。
构建该全局地图所采用的坐标系可称为第二坐标系,本文中的第二坐标系在某些应用场景下也可能被称为全局坐标系、世界坐标系或其他某个合适的术语。为了理解上的方便,后文将主要以“全局坐标系”的名称进行方案的介绍。相应的,电子设备在全局坐标系下所体现出的位姿可以称为全局位姿。
参见图2,图2示例性地示出了电子设备10的一种结构示意图。应该理解的是,本申请实施例示意的结构并不构成对电子设备10的具体限定。在本申请另一些实施例中,电子设备10可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
如图2所示,电子设备10可包括:芯片310、存储器315(一个或多个计算机可读存储介质)、用户接口322、显示组件323、摄像头324、用于设备定位的定位模块331以及用于通信的收发器332。这些部件可在一个或多个通信总线314上通信。
芯片310可集成包括:一个或多个处理器311、时钟模块312以及电源管理模块313。集成于芯片310中的时钟模块312主要用于为处理器311提供数据传输和时序控制所需要的计时器,计时器可实现数据传输和时序控制的时钟功能。处理器311可以根据指令操作码和时序信号,执行运算,产生操作控制信号,完成取指令和执行指令的控制。集成于芯片310中的电源管理模块313主要用于为芯片310以及电子设备10的其他部件提供稳定的、高精确度的电压。
处理器110又可称为中央处理器(CPU,central processing unit),处理器110具体可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
存储器315可与处理器311通过总线连接,也可以与处理器311耦合在一起,用于存储各种软件程序和/或多组指令。具体实现中,存储器315可包括高速随机存取的存储器(例如高速缓冲存储器),并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存 设备或其他非易失性固态存储设备。存储器315可以存储操作***,例如ANDROID,IOS,WINDOWS,或者LINUX等嵌入式操作***。存储器315还用于存储SLAM***的相关程序。存储器315用于存储数据(例如图像数据、点云数据、地图数据、关键帧数据、位姿数据、坐标系转换信息等)。存储器315还可以存储通信程序,该通信程序可用于与一个或多个服务器或者其他设备进行通信。存储器315还可以存储一个或多个应用程序。如图示中这些应用程序可包括:AR/VR/MR等虚拟场景应用程序、地图类应用程序、图像管理类应用程序等等。存储器115还可以存储用户界面程序,该用户界面程序可以通过图形化的操作界面将应用程序的内容(如AR/VR/MR等虚拟场景中的虚拟物体)形象逼真的显示出来并通过显示组件323呈现,以及实现通过菜单、对话框以及按键等输入控件接收用户对应用程序的控制操作。存储器315可以用于存储计算机可执行程序代码,该可执行程序代码包括指令。
用户接口322例如可以是触控面板,通过触控面板可检测用户在检测触控面板上的操作指令,用户接口322也可以是小键盘、物理按键或者鼠标。
电子设备10可以包括一个或多个显示组件323。电子设备10可通过显示组件323、芯片310中的图形处理器(GPU)以及应用处理器(AP)等共同实现显示功能。GPU为用于图像处理的微处理器,连接显示组件323和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。显示组件323用于显示***当前输出的界面内容,例如显示AR/VR/MR等虚拟场景中的图像、视频等,界面内容可包括正在运行的应用程序的界面以及***级别菜单等,具体可由下述界面元素组成:输入型界面元素,例如按键(Button),文本输入框(Text),滑动条(Scroll Bar),菜单(Menu)等等;以及输出型界面元素,例如视窗(Window),标签(Label),图像,视频,动画等等。
显示组件323在具体实现中,可以是显示面板,镜片(例如VR眼镜),投影屏等。显示面板也可以称为显示屏,例如可以是触摸屏、柔性屏、曲面屏等,或者其他光学组件。也即是说在本申请中电子设备具有显示屏时,显示屏可以是触摸屏、柔性屏、曲面屏或者其它形式的屏幕,电子设备的显示屏具有显示图像的功能,至于显示屏的具体材质以及形状本申请对此不作任何限定。
例如,当显示组件323包括显示面板时,显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。此外,一些具体实现中,可以将用户接口322中的触控面板和显示组件323中的显示面板耦合在一起设置,例如触控面板可设置于显示面板下方,触控面板可用于检测用户通过显示面板输入触控操作(例如点击、滑动,触摸等)时作用于显示面板上的触控压力,显示面板用于进行内容显示。
摄像头324可以是单目摄像头或双目摄像头或深度摄像头,用于对环境进行拍摄/录像,以获得图像/视频图像。摄像头324所采集的图像/视频图像例如可作为SLAM***的一种输入数据,或者可通过显示组件323进行图像/视频显示。
在一些场景中,也可以把摄像头324视为一种传感器。摄像头324采集的图像可以是IMG格式,也可以是其他格式类型,这里不做限定。
传感器325可用于采集与电子设备10的状态变化(例如旋转、摆动、移动、抖动等)相关的数据,传感器325所采集的数据例如可作为SLAM***的一种输入数据。传感器325可包括一种或多种传感器,例如惯性测量单元(Inertial Measurement Unit,IMU),飞行时间(Time of Flight,TOF)传感器等。其中IMU中可进一步包括陀螺仪、加速度计等传感器。陀螺仪可用于测量电子设备运动时的角速度,加速度计用于测量电子设备运动时的加速度。TOF传感器可进一步包括光发射器和光接收器,光发射器可用于向外发射光线,例如激光、红外线、雷达波等,光接收器可用于检测反射的光线,例如反射的激光、红外线、雷达波等。
需要说明的,传感器325还可以包括更多的其他传感器,例如惯性传感器,气压计,磁力计,轮速计等。
定位模块331用于实现对电子设备10的物理定位,例如用于获取电子设备10的初始位置。定位模块331例如可包括WIFI定位模块、蓝牙定位模块、基站定位模块、卫星定位模块中的一个或多个。其中卫星定位模块中可设置有全球导航卫星***(Global Navigation Satellite System,GNSS)以辅助定位,GNSS不限于北斗***、GPS***、GLONASS***、Galileo***。
收发器332用于实现电子设备10与服务器或其他终端设备之间的通信。收发器332集成了发射器和接收器,分别用于发送和接收射频信号。具体实现中,收发器332可包括但不限于:天线***、RF收发器、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、CODEC芯片、SIM卡和存储介质等。在一些实施例中,还可在单独的芯片上实现收发器332。收发器332例如可支持通过2G/3G/4G/5G等中的至少一种的数据网络通信,和/或支持以下近距无线通信的方式中的至少一种:蓝牙(Bluetooth,BT)通信,无线保真(WirelessFidelity,WiFi)通信,近场通信(Near Field Communication,NFC),红外(Infrared,IR)无线通信,超宽带(UWB,Ultra Wide Band)通信,ZigBee通信。
本申请实施例中,处理器311通过运行存储在存储器315的指令,从而执行电子设备10的各种功能应用以及数据处理,具体的,可执行如图7实施例所示方法步骤,或者可执行如图11、图12、或图15实施例中电子设备侧的功能。
参见图3,图3是本申请实施例提供的一种服务器20的实施方式的结构框图。如图3所示,服务器20包括处理器403、存储器401(一个或多个计算机可读存储介质)、收发器402。这些部件可在一个或多个通信总线404上通信。其中:
处理器403可以是一个或多个中央处理器(Central Processing Unit,CPU),在处理器403是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
存储器401可与处理器403通过总线连接,也可以与处理器403耦合在一起,用于存储各种软件程序和/或多组指令、以及数据(例如地图数据、位姿数据等)。具体实现中,存储器401包括但不限于是随机存储记忆体(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable Read Only  Memory,EPROM)、或便携式只读存储器(Compact Disc Read-Only Memory,CD-ROM)。
收发器402主要集成了接收器和发射器,其中接收器用于接收电子设备发送的数据(例如请求、图像等),发射器用于向电子设备发送数据(例如地图数据、位姿数据等)。
应当理解,上述服务器20仅为本申请实施例提供的一个例子,具体实现中,服务器20可具有比图示更多的部件。
本申请具体实施例中处理器403可用于调用存储器401中的程序指令,执行如图11、图12、或图15实施例中服务器侧的功能。
本文所使用的术语“耦合”意指直接连接到、或通过一个或多个居间组件或电路来连接。本文所描述的在各种总线上提供的任何信号可以与其他信号进行时间复用并且在一条或多条共用总线上提供。另外,各电路元件或软件块之间的互连可被示为总线或单信号线。每条总线可替换地为单信号线,而每条单信号线可替换地为总线,并且单线或总线可表示用于各组件之间的通信的大量物理或逻辑机制中的任一个或多个。
在AR/VR/MR等虚拟场景中,电子设备在显示组件上实时呈现的虚拟物体的位置/姿态与电子设备自身的位置/姿态息息相关,也即是说,电子设备自身的位姿数据决定了所呈现的虚拟物体形态、大小、内容等。所以,需要对电子设备的位姿进行跟踪。在一种现有的位姿跟踪方案中,SLAM***对输入的图像进行特征提取,并基于已建成的SLAM地图获得一种估计位姿,利用IMU采集的加速度、角速度等信息进行处理获得又一种估计位姿,最后将两种估计位姿通过一定算法进行整合处理,得到电子设备的最终位姿。然而,由于每次估计的位姿均存在一定的误差,所以,在进行位姿跟踪过程中,误差不断迭代累计,越来越大,所以当SLAM***长时间运行(例如1分钟)后,电子设备在局部坐标系下的位姿(局部位姿)会出现漂移现象,虚拟物体在电子设备的显示位置和方向也会出现明显偏移。
为了纠正漂移现象,现有方案中电子设备向服务器传输当前图像,服务器基于该当前图像计算出电子设备在全局坐标系下的位姿并传回电子设备(中间时延例如2秒),电子设备计算并更新出全局坐标系和局部坐标系之间的坐标变换矩阵。由于SLAM***已存在漂移现象,当前计算的坐标系变换矩阵与前一次计算的坐标系变换矩阵数值上将会有明显差异,因此SLAM***在使用当前计算的坐标系变换矩阵来更新局部位姿时,局部位姿将有显著改变,导致所呈现的虚拟物体存在跳动现象。
为了便于理解,以图4所示AR场景为例,图4示出了一种出现位姿偏移情况的场景和理想情况的场景。图示中104表示环境中的真实图像,导航指示105、店面指示106、互动机器人107均为使用图形技术和可视化技术生成的虚拟对象。基于当前电子设备的位姿,可以计算出这些虚拟对象在画面中的位姿和姿态,从而使得这些虚拟对象与真实环境图像融合为一体,以呈现给用户真实的感观体验。对比图6中的两种情况可以看到,当SLAM***出现位姿漂移现象时,这些虚拟对象会在画面中呈现中不合理的偏移,导致虚拟对象与真实环境图像的融合效果失真,极大影响用户感观体验。当SLAM***对漂移现象进行纠正时,画面中的这些虚拟对象从不合理的偏移状态跳转为正常理想状态,所以用户会看到“虚拟对象跳动”的现象,这也给用户带来了不好的使用体验。
参见图5,图5是本申请实施例提供的一种***的结构框图。该***包括电子设备10和服务器20,电子设备10和服务器20之间可通过各自的收发器实现通信连接。为了实现在电子设备10上减少甚至消除位姿的漂移现象和跳动现象,实现高度准确的位姿跟踪,本申请实施例中,在电子设备10中配置SLAM***12、数据采集模块13、交互模块14和通信模块11,SLAM***12、数据采集模块13、交互模块14和通信模块11可以以软件代码的形式存在,在一具体实现中,这些功能模块的数据/程序可被存储于如图2所示的存储器315,并可运行于如图2所示的处理器311。其中:
通信模块11可利用如图2所示的收发器332,实现与服务器20的通信,具体的,通信模块11被配置成从服务器20获取全局子地图,所述全局子地图是服务器20所存储的全局地图中与电子设备10的位置信息对应的子地图。并将全局子地图存储到电子设备10的SLAM***12中的全局子地图的数据库123中。
数据采集模块13被配置成利用如图2所示的传感器325来获得电子设备10的状态数据,利用如图2所示的摄像头324来获得视频图像,以及利用如图2所示的定位模块331来获得电子设备的位置。
交互模块14被配置成利用如图2所示的用户接口322实现用户操作的检测获得,以及利用如图2所示的显示组件323实现图像/视频/虚拟物体等的显示,例如AR/VR/MR等应用内容的显示。
SLAM***12中的计算模块121被配置成根据摄像头324采集的视频图像和所下载的全局子地图执行位姿计算,以得到电子设备10的位姿数据;交互模块14可基于电子设备10的位姿数据,在显示组件上显示虚拟物体。
SLAM***12自身所构建的SLAM地图保存在SLAM地图的数据库122中,SLAM***还被配置成基于电子设备10的位姿数据来更新数据库122中的SLAM地图。
具体实施例中,电子设备10中的各功能模块可相互配合,以执行如图7实施例所示方法中的步骤,或者执行如图11、图12、或图15实施例中电子设备侧的功能。
在服务器20中配置通信模块21、处理模块22、全局地图的数据库23。通信模块21、处理模块22和全局地图的数据库23可以以软件代码的形式存在。在一具体实现中,这些功能模块的数据/程序可被存储于如图3所示的存储器401,并可运行于如图3所示的处理器403。其中:
全局地图的数据库23用于存储、维护和更新全局地图。
处理模块22可被配置成基于电子设备10的位置信息从数据库23所存储的全局地图中获取与电子设备10的位置信息对应的子地图,即全局子地图。
通信模块21可利用如图3所示的收发器402,实现与电子设备10的通信。具体的,通信模块21可将全局子地图发送给电子设备10。
具体实施例中,服务器20中的各功能模块可相互配合,以执行如图11、图12、或图15实施例中服务器侧的功能。
参见图6,图6示出了在一种具体实现中,图5所示的电子设备10中的各功能模块进一步可能包含的组件(或称子模块),以及服务器20中的各功能模块进一步可能包含的组 件(或称子模块)。需要说明的是,电子设备10中的各功能模块(例如SLAM***、数据采集模块13,交互模14)进一步可能包含的组件和服务器20中的各功能模块(例如处理模块22)进一步可能包含的组件(或称子模块)仅为本申请实施例的示例,在其他实施例中,上述功能模块还可以包含更多或更少的组件(或称子模块)。
如图6所示,SLAM***12中的计算模块121进一步包括建图模块1211,位姿估计模块1212,位姿估计模块1212、特征处理模块1213和闭环纠正模块。此外,在电子设备10中还包括全局定位模块16和软件开发工具包(Software Development Kit,SDK)。全局定位模块16进一步包括图像检索模块161、特征提取模块162、特征匹配模块163和位姿估计模块164。SDK中可包括分别用于全局位姿和局部位姿的数据库以及坐标系变化矩阵计算模块15,SDK还可以调用交互模块14通过显示组件实现显示。
特征处理模块1213可用于与视觉特征处理相关的操作,例如在一种实施例中特征处理模块1213可进一步包括特征提取模块(图未示)和特征匹配模块(图未示)。特征提取模块包括特征检测功能和特征描述功能,特征检测功能是在图像中提取出特征的图像位置,特征描述功能是对每一个检测出的特征进行描述,形成一个一维向量,用于特征匹配模块的特征匹配中。
数据采集模块13(例如IMU传感器)能输出高频率的角速度、线加速度,位姿估计模块1212对角加速度、线加速度分别积分,并结合摄像头拍摄的视频图像进行位姿估计,可计算出电子设备的位置和姿态。位姿估计的结果可作为SLAM***输出。同时,位姿估计的结果也可作为建图模块1211的输入。建图模块1211在局部坐标系下,创建SLAM***能感知到的环境地图,即SLAM地图。SLAM***在空间中不断运行的过程中,不断在创建/更新SLAM地图。当SLAM***回到运行过的场景时,闭环纠正模块1214可用于减少SLAM地图可能累计的误差。在一些实施例中,SLAM地图反过来也可以作为位姿估计模块1212进行位姿估计的输入,提高位姿估计的准确度。
在一种实施例中,数据库123中保存有从服务器10下载的全局子地图,位姿估计模块1212被配置成根据所述全局子地图和摄像头324采集的视频图像以及传感器325采集的运动数据执行位姿计算以得到电子设备10的位姿数据,即获得电子设备10的全局位姿,从而实现对电子设备10的位姿的跟踪定位。所述运动数据包括电子设备10的运动速度数据和运动方向数据,例如加速度、角速度等。具体的,特征处理模块1213可用于提取视频图像的2D特征,位姿估计模块1212可根据所述视频图像的2D特征、全局子地图的3D地图点以及数据采集模块13(例如IMU)采集的运动数据,获得电子设备10在所述全局子地图中的全局位姿。
在一种实施例中,位姿估计模块1212被配置成以相对较高的第一频率来跟踪电子设备10的全局位姿,第一频率表示SLAM***执行全局位姿估计的频率,即SLAM***调用数据库123中的全局子地图的频率,第一频率在数值上可以小于等于显示组件显示视频流的频率。例如,第一频率为每秒30帧,当然这里只是解释说明而非限定。也就是说,位姿估计模块1212可以以较高的固定频率调用所述全局子地图实现对电子设备10的位姿跟踪。这样,在实时获得电子设备10的全局位姿后,可以根据电子设备10的全局位姿在所述显示组件上显示AR/VR/MR等虚拟场景中虚拟物体的位置与姿态。
SLAM***的位姿估计模块1212也可用于根据摄像头采集的视频图像序列中的第K帧图像和局部坐标系下的SLAM地图,确定电子设备10在局部坐标系下的SLAM地图中的局部位姿(这里的局部位姿可称为第一位姿数据),K为大于等于1的整数。具体的,特征处理模块1213可用于提取所述第K帧图像中的2D特征,位姿估计模块1212可根据所述第K帧图像中的2D特征、局部坐标系下的SLAM地图的3D地图点以及数据采集模块13(例如IMU)采集的运动数据,获得电子设备10在所述局部坐标系下的SLAM地图中的第一位姿数据;所述运动数据包括电子设备10的运动速度数据和运动方向数据,例如加速度、角速度等。
全局位姿模块16用于在初始时刻乃至之后的任一时刻,根据所采集的视频图像中的第K帧图像和全局子地图,确定所述电子设备10在全局子地图中的全局位姿(这里的全局位姿可称为第二位姿数据),K为大于等于1的整数。具体的,可通过图像检索模块161获取视频图像序列中的第K帧图像,通过特征提取模块162根据所述第K帧图像进行特征提取,获得图像特征,通过特征匹配模块163将所述图像特征在全局子地图中进行特征匹配,获得与所述图像特征匹配的地图特征;通过位姿估计模块164根据所述图像特征和所述地图特征,计算获得所述电子设备10在全局子地图中的第二位姿数据。
上述第一位姿数据(局部位姿)和第二位姿数据(全局位姿)可分别保存于SDK中的数据库。坐标系变化矩阵计算模块15可配置成根据所述第一位姿数据和所述第二位姿数据,计算获得所述SLAM地图的局部坐标系和所述全局地图的全局坐标系之间的坐标系变换信息(例如,坐标系变换矩阵)。并将该坐标系变换信息反馈给SLAM***。
SLAM***中的建图模块1211可被配置成根据电子设备10的位姿数据更新SLAM***的数据库122中的SLAM地图。具体的,建图模块1211可将根据坐标系变换信息将SLAM地图变换到全局坐标系下,以电子设备10的全局位姿作为在全局坐标系下的SLAM地图中的位姿数据来更新所述全局坐标系下的SLAM地图。
具体实施例中,电子设备10中的各功能模块可相互配合,以执行如图7实施例所示方法中的步骤,或者执行如图11、图12实施例中电子设备侧的功能。
在一种实施例中,服务器20中的处理模块22进一步包括子地图处理模块221和位置指纹匹配模块222,位置指纹匹配模块222被配置成根据电子设备10的发送的初始位置的位置指纹信息(这里可称为第一位置指纹信息),在所述全局地图中寻找具有与所述第一位置指纹信息相匹配的位置指纹信息(这里可称为第二位置指纹信息)。子地图处理模块221用于在所述数据库23的全局地图中取出具有所述第二位置指纹信息的全局子地图。该全局子地图也可另外保存于服务器20的数据库,以便于下次快捷取用。服务器20可将具有所述第二位置指纹信息的全局子地图发送给电子设备10。
具体实施例中,服务器20中的各功能模块可相互配合,以执行如图11、图12实施例中服务器侧的功能。
基于上文的描述,下面给出本申请实施例提供的一些虚拟物体显示方法。对于下文描述的各方法实施例,为了方便起见,将其都表述为一系列的动作步骤的组合,但是本领域技术人员应该知悉,本申请技术方案的具体实现并不受所描述的一系列的动作步骤的顺序的限制。
参见图7,图7是本申请实施例提供的一种虚拟物体显示方法的流程示意图,在一些实现中,该方法可应用于具有显示组件和摄像头的电子设备。该方法包括但不限于以下步骤:
S1011、检测到用户打开应用的操作。
S1012、响应于该操作,下载全局子地图并存储到电子设备的SLAM***中;所述全局子地图是全局地图中与电子设备的位置对应的子地图。
S1013、在电子设备的显示组件上显示虚拟物体的位置与姿态,虚拟物体的位置与姿态是至少根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算得到的。
本申请实施例中,用户在电子设备上输入用于打开应用(application,APP)的操作,例如点击、触摸、滑动、抖动、声控等操作,响应于所述操作,一方面在电子设备的显示组件(例如显示面板或者镜片)上显示应用的界面,另一方面启动从服务器或者其他设备(例如其他终端设备或硬盘、USB等存储介质)下载全局子地图的进程。其中,应用可以是电子设备中安装的AR/VR/MR等应用。
示例性的,参见图8,在一种可能的实现场景中,以电子设备10为手机为例,电子设备10中安装有某种虚拟场景应用(例如,可以是带有AR功能的导航地图应用),图8中的(a)示出了电子设备10的显示面板上的一种图形用户界面(graphical user interface,GUI),该GUI为电子设备的桌面101。当电子设备检测到用户点击桌面101上的虚拟场景应用的图标102的操作后,一方面在后台启动下载全局子地图的流程,另一方面,在启动该虚拟场景应用后,在显示面板上显示如图8中的(b)所示的另一GUI,该GUI的用户界面103以AR导航界面为例,该用户界面103又可称为第一界面。该用户界面103上可以包括取景框104。该取景框104内可以实时显示电子设备10所在的真实环境的预览视频流。该预览视频流是通过电子设备10的摄像头拍摄获得的。在该预览视频流上,还叠加有AR应用的虚拟物体。虚拟物体的数量可以是一个或多个,如图8中的(b)所示中,虚拟物体示例性地以导航指示105、店面指示106、互动机器人107为例,导航指示105能够通过方位箭头来实时指示当前位置到达某目的地的导航路线,店面指示106能够实时、准确地在指示取景框104的视频流中显式或隐式出现的店铺的类型名称,互动机器人107可用来实现语音对话、语音介绍、或者仅仅作为街边趣味性的展示等等。
具体的,在电子设备10在后台(处理器)中,根据所获得的全局子地图和摄像头实时拍摄的视频图像,执行位姿计算,从而实时获得电子设备10当前在全局坐标系下的位姿数据(即全局位姿)。AR场景中虚拟物体的位置和姿态可由电子设备10的全局位姿来决定,即电子设备的全局位姿表征了所述虚拟物体的位置与姿态。所以,可以实时基于电子设备10的全局位姿,在该取景框104内显示AR场景中虚拟物体的位置与姿态。
AR应用可借助计算机图形技术和可视化技术生成现实环境中不存在的虚拟物体,并基于电子设备10当前的全局位姿,将虚拟物体叠加到取景框104内,从而在取景框104内把虚拟物体叠加到取景框104的视频流中。
举例来说,电子设备10通过摄像头捕获到视频图像后,AR应用向AR云服务器发送视频图像,以向AR云服务器请求获取与视频图像对应的待渲染对象,待渲染对象可以包括待渲染对象的标识和/或待渲染对象的名称、元数据等信息。AR云服务器将与视频图像对应 的待渲染对象发给电子设备10,电子设备10确定出与该待渲染对象对应的业务规则,利用每个待渲染对象的业务规则渲染对应的待渲染对象,从而生成多个一个或多个AR对象(即虚拟物体),并基于电子设备10的全局位姿,把虚拟物体叠加到取景框104的视频流中。
在一种实施例中,电子设备10的全局位姿是电子设备10至少根据电子设备10摄像头采集的视频图像和所述全局子地图以相对较高的第一频率执行位姿计算得到的,第一频率表示电子设备10中的SLAM***执行全局位姿估计的频率,即调用全局子地图的频率。第一频率在数值上可以小于等于显示面板显示视频流的频率。也就是说,电子设备10的SLAM***可以以较高的固定频率调用所述全局子地图实现对电子设备10的位姿跟踪。这样,在实时获得电子设备10的全局位姿后,可以根据电子设备10的全局位姿在该取景框104内实时显示和更新AR场景中虚拟物体的位置与姿态。
当用户手持电子设备10行走一段时间(例如1分钟、5分钟、10分钟等,这里不做限定),在这段时间内,电子设备10能够持续将虚拟物体准确地叠加在取景框104的视频流中。如图8中的(c)所示,在电子设备10移动一段时间后,取景框104的视频流104中依然能够准确无误地显示导航指示105、店面指示106、互动机器人107,在这段时间内既没有漂移现象也没有跳动现象。之所以如此是因为,一方面,在这段时间内,电子设备10中的SLAM***是基于全局子地图来计算电子设备10的位姿,能够克服电子设备10中预先构建的SLAM地图不准确的问题,尽量避免位姿误差的累计,从而最大程度地克服漂移现象的发生;另一方面电子设备10能够稳定地、高频次地基于全局子地图来执行全局位姿估计,避免了全局位姿的突变,从而也最大程度地克服跳动现象的发生,提升用户使用体验。
在一种实施例中,当电子设备10快要移出全局子地图的地理范围时,可以基于电子设备10的位置提前请求下载新的全局子地图,后续电子设备10即可根据新的全局子地图来执行全局位姿估计,从而能进一步避免在两个全局子地图对应的地理范围切换时位姿突变的发生,进一步提升用户使用体验。
参见图9,在又一种可能的实现场景中,从用户点击虚拟场景应用到显示面板呈现AR画面的过程还可以通过如下方式实现:
同样以电子设备10为手机为例,电子设备10中安装有某种虚拟场景应用(例如,可以是带有AR功能的导航地图应用),图9中的(a)示出了电子设备10的显示面板上的桌面101。当电子设备检测到用户点击桌面101上的虚拟场景应用的图标102的操作后,在电子设备10的显示面板上显示如图9中的(b)所示的用户界面108,用户界面108中包括用于输入账号和密码的文本框,以提示用户通过验证身份来登陆应用。当电子设备10检测到用户输入账号和密码进行登录后,且电子设备10验证账号和密码正确后,一方面,在显示面板上显示如图9中的(c)所示的该应用的用户界面103,在图9中的(c)中,该用户界面103上可以包括取景框104。该取景框104内可以实时显示电子设备10所在的真实环境的预览视频流。该预览视频流是通过电子设备10的摄像头拍摄获得的。可选的,在预览视频流上还可叠加有预设的导航输入框,以便于用于输入用于AR导航的出发地和目的地。另一方面,在后台启动下载全局子地图的流程,下载流程可能会消耗一定的时间(例如2秒),但是由于当前已为用户呈现了视频流实景画面,而且还需要用户花一定时间输入 AR导航的出发地和目的地,所以用户并不会感知到地图下载流程的存在,这样,能够避免了下载时延造成的用户等待,进一步提升用户体验。
同样,全局子地图下载完成后,电子设备10可在后台处理器中,根据所获得的全局子地图和摄像头实时拍摄的视频图像,执行位姿计算,从而实时获得电子设备10当前在全局坐标系下的位姿数据(即全局位姿)。这样,就可以实时基于电子设备10的全局位姿,在该取景框104内显示AR场景中虚拟物体的位置与姿态。如图9中的(d)所示中,虚拟物体示例性地以导航指示105、店面指示106、互动机器人107为例。电子设备10的SLAM***可以以较高的固定频率调用所述全局子地图实现对电子设备10的位姿跟踪。这样,在实时获得电子设备10的全局位姿后,可以根据电子设备10的全局位姿在该取景框104内实时显示和更新AR场景中虚拟物体的位置与姿态,能够最大程度地克服漂移现象和跳动现象的发生。
参见图10,在又一种可能的实现场景中,从用户点击虚拟场景应用到显示面板呈现AR画面的过程还可以通过如下方式实现:
同样以电子设备10为手机为例,电子设备10中安装有某种虚拟场景应用(例如,可以是带有AR功能的导航地图应用),图10中的(a)示出了电子设备10的显示面板上的桌面101。当电子设备检测到用户点击桌面101上的虚拟场景应用的图标102的操作后,在电子设备10的显示面板上显示如图9中的(b)所示的该应用的一个用户界面108,这里的用户界面108中包括电子地图的界面以及多个控件109,如图示中包括电子地图控件,卫星地图控件和AR控件。当电子设备10检测到用户点击AR控件后,一方面,在显示面板上显示如图10中的(c)所示的该应用的用户界面103,在图10中的(c)中,该用户界面103上可以包括取景框104。该取景框104内可以实时显示电子设备10所在的真实环境的预览视频流。该预览视频流是通过电子设备10的摄像头拍摄获得的。可选的,在预览视频流上还可叠加有预设的导航输入框,以便于用于输入用于AR导航的出发地和目的地。另一方面,在后台启动下载全局子地图的流程,由于当前已为用户呈现了视频流实景画面而且还需要用户花一定时间输入AR导航的出发地和目的地,所以用户并不会感知到地图下载流程的存在,这样,能够避免了下载时延造成的用户等待,进一步提升用户体验。
同样,全局子地图下载完成后,电子设备10可在后台处理器中,根据所获得的全局子地图和摄像头实时拍摄的视频图像,执行位姿计算,从而实时获得电子设备10当前在全局坐标系下的位姿数据(即全局位姿)。这样,就可以实时基于电子设备10的全局位姿,在该取景框104内显示AR场景中虚拟物体的位置与姿态。如图10中的(d)所示中,虚拟物体示例性地以导航指示105、店面指示106、互动机器人107为例。电子设备10的SLAM***可以以较高的固定频率调用所述全局子地图实现对电子设备10的位姿跟踪。这样,在实时获得电子设备10的全局位姿后,可以根据电子设备10的全局位姿在该取景框104内实时显示和更新AR场景中虚拟物体的位置与姿态,能够最大程度地克服漂移现象和跳动现象的发生。
参见图11,图11是本申请实施例提供的又一种虚拟物体显示方法的流程示意图,从电子设备和服务器侧分别进行描述。该方法包括但不限于以下步骤:
S201、服务器基于电子设备的请求,下发全局子地图给电子设备。相应的,电子设备接收该全局子地图。该全局子地图是全局地图中与电子设备的位置对应的子地图。
S202、电子设备将该全局子地图存储到所述电子设备的SLAM***中。
S203、电子设备根据所述摄像头采集的视频图像和所述全局子地图,以第一频率执行位姿计算,以持续更新所述电子设备的全局位姿。
其中,第一频率表示电子设备的SLAM***执行全局位姿估计的频率,即SLAM***调用全局子地图的频率,第一频率在数值上可以小于等于显示组件显示视频流的频率。例如,第一频率为每秒30帧,当然这里只是解释说明而非限定。也就是说,SLAM***可以以较高的固定频率调用所述全局子地图实现对电子设备的位姿跟踪。这样,在实时获得电子设备的全局位姿后,可以根据电子设备的全局位姿在所述显示组件上显示AR/VR/MR等虚拟场景中虚拟物体的位置与姿态。
其中,电子设备的全局位姿用于表示电子设备在全局坐标系中的位置和姿态(方位),举例来说,位置可以通过三个坐标轴x、y、z表示,姿态(方位)可以通过(α,β,γ)来表示,(α,β,γ)表示围绕三个坐标轴旋转的角度。
在一可能实施例中,电子设备中设置有摄像头,在每次的全局位姿计算中,SLAM***的输入信号包括摄像头采集的视频图像和全局子地图,SLAM***可根据摄像头采集的视频图像在全局子地图中做匹配,从而计算获得电子设备在全局子地图中的全局位姿。
在又一可能实施例中,电子设备中设置有惯性测量单元(IMU),在每次的全局位姿计算中,SLAM***的输入信号包括来自摄像头采集的视频图像、IMU采集的运动数据和全局子地图。IMU高频率地检测电子设备的角速度和线加速度,并对角加速度和线加速度分别积分,进而可计算出电子设备的位姿(例如这里的位姿可称为IMU所测位姿)。摄像头采集的视频图像通过在全局子地图中做匹配,从而也能计算获得电子设备的位姿(例如这里的位姿可称为图像所测位姿)。那么,基于IMU所测位姿和图像所测位姿共同运算,就可以获得较准确的最终位姿,该最终位姿作为电子设备在全局子地图中的全局位姿。
在又一可能实施例中,电子设备中除了摄像头和IMU外,还设置有与位姿或运动相关的定位模块(GPS定位、北斗定位、WIFI定位、或基站定位等),那么,SLAM***还可共同参考摄像头采集的视频图像、IMU采集的运动数据、全局子地图以及定位模块所采集的数据来计算获得电子设备在全局子地图中的全局位姿。
S204、电子设备基于所述电子设备的全局位姿,在所述显示组件上显示所述虚拟物体。
可以看到,IMU信号和图像信号作为SLAM的输入,估计相机为的过程。此过程内部包含了SLAM地图数据作为位姿估计输入。虽然SLAM地图经过内部的闭环纠正模块之后,可以减少SLAM地图的长时间漂移,但是,仍然存在较大误差。因此,本专利在下载完子地图之后,将子地图作为位姿估计模块的输入,其作用于SLAM地图作为位姿估计的作用一样。但是,子地图的准确度比SLAM地图更加准确,作为位姿估计的输入消除位姿估计长时间漂移,也可以消除SLAM地图的长时间漂移现象和跳动现象。
可以看到,本申请实施例中,电子设备通过请求服务器下载全局子地图,利用比SLAM地图精度更高的全局子地图作为视觉观测输入给SLAM***,电子设备的SLAM***可以以较高的固定频率调用所述全局子地图,以实现对电子设备的位姿跟踪。这样,在实时获得 电子设备的全局位姿后,可以根据电子设备的全局位姿在显示组件内实时显示和更新AR场景中虚拟物体的位置与姿态。并且在长时间的位姿更新过程中,在虚拟物体的位置与姿态也不会发生漂移现象和跳动现象。一方面,在这段时间内,电子设备中的SLAM***是基于全局子地图来计算电子设备的全局位姿,能够克服电子设备中预先构建的SLAM地图不准确的问题,尽量避免位姿误差的累计,从而最大程度地克服漂移现象的发生;另一方面电子设备能够稳定地、高频次地基于全局子地图来执行全局位姿估计,极大减少了全局位姿的突变;再一方面,计算全局位姿的过程在电子设备侧完成,位姿估计的算法延迟低,位姿跟踪效果好。所以,本实施例能长时间显示虚拟物体且虚拟物体在画面中不会偏移出错,消除了由于位姿突变导致的虚拟物体的跳动现象,进一步提升用户体验。
参见图12,图12是本申请实施例提供的又一种虚拟物体显示方法的流程示意图,从电子设备和服务器侧分别进行描述。该方法包括但不限于以下步骤:
S301、电子设备向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息。
本申请实施例中,电子设备向服务器上传位置指纹信息,所述初始位置可能是所述电子设备请求下载地图时电子设备所处的地理位置信息,例如位置指纹信息来源包括GNSS/WiFi/蓝牙/基站等定位方式所测得的初始位置信息、信号强度信息、信号特征信息等等;也可能是在由用户输入的位置信息。
S302、服务器从全局地图中,获取与第一位置指纹信息匹配的全局子地图。所述全局子地图是全局地图中与所述电子设备的位置对应的子地图。
S303、服务器下发全局子地图给电子设备。相应的,电子设备接收全局子地图。
一种实施例中,服务器根据上述第一位置指纹信息,与服务器预先保存在数据库中的子地图的位置指纹信息做匹配,所述数据库中的子地图为隶属于全局地图的子地图。如果有匹配上的子地图,则将匹配上的子地图传输到电子设备。
又一种实施例中,服务器根据上述第一位置指纹信息,在服务器保存的全局地图中遍历,直到寻找到位置指纹信息匹配的区域,服务器从全局地图中取出该区域作为全局子地图,将全局子地图传输到电子设备。
S304、电子设备将该全局子地图存储到所述电子设备的SLAM***中。
S305、电子设备计算坐标系变换信息,并基于坐标系变换信息对SLAM地图进行变换。
具体的,电子设备计算坐标系变换信息的过程可描述如下:电子设备获取摄像头采集的视频图像序列中的第K帧图像,K为大于等于1的整数;然后,一方面,根据该第K帧图像和SLAM地图,确定电子设备在原先所构建的局部坐标系下的SLAM地图中的局部位姿(这里可称为第一位姿数据)。
例如,电子设备中设置有IMU,SLAM***的输入信号包括来自摄像头采集的视频图像、IMU采集的运动数据和局部坐标系下的SLAM地图。IMU高频率地检测电子设备的角速度和线加速度,并对角加速度和线加速度分别积分,进而可计算出电子设备的位姿。摄像头采集的视频图像通过在局部坐标系下的SLAM地图中做匹配,从而也能计算获得电子设备的位姿。那么,基于这两种位姿以一定算法进行运算,就可以获得该第一位姿数据。
又例如,电子设备中除了摄像头和IMU外,还设置有与位姿或运动相关的定位模块(GPS 定位、北斗定位、WIFI定位、或基站定位等),那么,SLAM***还可共同参考摄像头采集的视频图像、IMU采集的运动数据、局部坐标系下的SLAM地图以及定位模块所采集的数据来计算获得该第一位姿数据。
另一方面,可根据该第K帧图像和全局子地图,对第K帧图像进行特征提取,并与全局子地图进行特征匹配,进而确定电子设备在全局子地图中的全局位姿(这里可称为第二位姿数据)。
例如,电子设备对第K帧图像进行特征检测,在第K帧图像中提取出特征的图像位置,特征检测算法不限于FAST、ORB、SIFT、SURF、D2Net、SuperPoint等特征检测方法。然后对每一个检测出的特征进行特征描述,特征描述算法不限于ORB、SIFT、SURF、BRIEF、BRISK、FREAK、D2Net、SuperPoint等特征描述方法,从而形成一个一维向量,用于后续的特征匹配。通过特征匹配,电子设备可从全局子地图中匹配出与第K帧图像最相似的地图内容(例如一帧或多帧关键帧),具体方法例如包括基于BOW、VLAD等传统图像检索方法以及基于NetVLAD、AI的新型图像检索方法。特征匹配具体是计算两个特征描述之间的相似程度,对于Float型向量,可以通过欧氏距离计算;对于二值向量,可以通过异或计算。找出与第K帧图像最相似的地图内容后,就可以基于第K帧图像和最相似的地图内容进行位姿估计,例如可采用PnP、EPnP、3D-3D等配准算法,从而可以计算出该第二位姿数据。
然后,电子设备可根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息,坐标系变换信息例如可以是坐标系变换矩阵。
例如,如图13所示,令第K帧图像在局部坐标系下的位姿(第一位姿数据)为 LT c,在全局坐标系下位姿(第二位姿数据)为 GT c,则
GT cGT L  LT c   (1)
GT LGT c( LT c) -1   (2)
这里, GT L可表示局部坐标系和全局坐标系之间的坐标系变换矩阵,根据这个坐标系变换矩阵就可以实现了两个坐标系同步,这样,原先用局部坐标系来表示的信息(例如局部位姿、图像的特征点、SLAM地图的3D地图点等等)就可以基于坐标系变换矩阵变换到全局坐标系下。
具体实施例中,在获得坐标系变换信息后,电子设备就可根据坐标系变换信息,将局部坐标系下的SLAM地图变换到全局坐标系下,即获得了全局坐标系下的SLAM地图。
S306、电子设备根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以获得电子设备的全局位姿。
在S305实现坐标系同步之后,全局子地图中的3D地图点与SLAM***的坐标系就能够统一。在一些实施例中,可以将全局子地图中的3D地图点将作为SLAM***的量测值输入,实现全局子地图和SLAM***的紧耦合,进而通过位姿估计实时跟踪电子设备的全局位姿。
具体地,基于坐标系变换信息后,局部坐标系下的位姿(局部位姿)和SLAM地图的3D 地图点就可以变换到全局坐标系下,这样就实现了SLAM***中的位姿、3D地图点与全局子地图中的3D地图点在同一个坐标系下表示。进而,全局子地图中3D地图点就可以作为SLAM***的量测值,这将有效消除SLAM***位姿跟踪的漂移现象。
例如,在位姿估计算法中,传统的视觉量测值(SLAM地图的3D地图点)是通过局部坐标系下SLAM***的三角化算法计算的,而三角化算法计算生成的3D地图点的准确度,依赖于位姿估计的准确度,由于位姿估计存在长时间漂移,因此,SLAM***计算的SLAM地图的3D地图点存在较大误差。反过来,当这些3D地图点作为量测值时,会导致位姿估计存在较大误差,如公式(3)所示:
Figure PCTCN2020113340-appb-000001
Figure PCTCN2020113340-appb-000002
表示传统方式中产生的位姿误差,公式中,i为图像帧索引,j为某一个图像帧观测到的特征索引,L表示在SLAM***的局部坐标系下的描述,p表示特征点的3D坐标值,P表示电子设备的位姿,z表示2D特征在图像上的观测值,
Figure PCTCN2020113340-appb-000003
表示基于估计的电子设备的位姿和2D特征对应的3D坐标通过相机投影函数h投影在图像上的2D坐标值。 Lp j表示SLAM***中三角化算法计算得到的3D地图点坐标值,作为量测值作用在SLAM算法中。
而本申请实施例中,通过在利用更为准确的全局子地图中的3D地图点替代SLAM***生成SLAM地图的3D地图点,能够消除位姿估计的误差,如公式(4)所示
Figure PCTCN2020113340-appb-000004
这里,
Figure PCTCN2020113340-appb-000005
表示本申请实施例方案中产生的位姿误差,G表示在全局坐标系下的描述, Gp j表示第i个图像帧观测到的第j个特征对应的3D地图点坐标值,这个坐标值来自全局子地图;在SLAM地图点的坐标转化到全局坐标系之后, Gp j也可以来自全局坐标系下的SLAM地图。 Gp j为全局子地图中的3D地图点,作为量测值作用在SLAM算法中。
S307、电子设备基于电子设备的全局位姿,在显示组件上显示虚拟物体。
S308、电子设备基于电子设备的全局位姿,更新全局坐标系下的SLAM地图。
可以理解的,由于电子设备的位姿、SLAM地图都已经转换到全局坐标系下,所以电子设备既可以基于电子设备的全局位姿,在显示组件上实时显示和更新虚拟物体,又可以将电子设备的全局位姿反馈到全局坐标系下的SLAM地图,基于全局位姿将当前的图像帧(关键帧)融合进全局坐标系下的SLAM地图,从而实现SLAM地图的扩展/延伸,且更新后的SLAM地图相比于传统的SLAM地图更加准确。
可以看到,本申请实施例中,通过根据同一帧分别获得终端在局部坐标系中的位姿和全局坐标系中的位姿,基于这两种位姿就能够获得两种坐标系之间的坐标系变换信息(例如坐标系变换矩阵),根据这个坐标系变换矩阵就可以实现了两个坐标系同步,这样,原先用局部坐标系来表示的信息(例如局部位姿、图像的特征点、SLAM地图的3D地图点等等) 就可以基于坐标系变换矩阵变换到全局坐标系下。这样就实现了SLAM***中的位姿、3D地图点与全局子地图中的3D地图点在同一个坐标系下表示。进而,全局子地图中3D地图点就可以作为SLAM***的量测值输入,实现全局子地图和SLAM***的紧耦合,进而通过位姿估计实时跟踪电子设备的全局位姿,这将能够有效消除SLAM位姿跟踪的漂移。后续需要更新SLAM地图时,就可以将电子设备的全局位姿作为在全局坐标系下的SLAM地图中的位姿数据来更新所述第二坐标系下的SLAM地图,提高了SLAM地图的准确性。
此外,本申请实施例,充分利用电子设备的运算能力,计算第一位姿数据、第二位姿数据以及坐标系变换信息,提高了处理效率,降低了处理时延,还能够减轻服务器的计算负担。
参见图14,图14示出了在又一种具体实现中,图5所示的电子设备10中的各功能模块进一步可能包含的组件,以及服务器20中的各功能模块进一步可能包含的组件。图14实施例和前述图6实施例的主要区别在于,图14实施例所示功能模块架构中,全局定位模块16的功能配置在服务器20侧实现,即服务器20中还包括图像检索模块161、特征提取模块162、特征匹配模块163和位姿估计模块164。
服务器20中的全局位姿模块16用于在初始时刻乃至之后的任一时刻,获取电子设备上传的至少一帧视频图像,基于该视频图像计算出电子设备10在全局子地图中的全局位姿(即第二位姿数据),并将第二位姿数据发送给电子设备10。具体的,可通过图像检索模块161获取电子设备10上传的视频图像序列中的第K帧图像,通过特征提取模块162根据所述第K帧图像进行特征提取,获得图像特征,通过特征匹配模块163将所述图像特征在全局子地图中进行特征匹配,获得与所述图像特征匹配的地图特征;通过位姿估计模块164根据所述图像特征和所述地图特征,计算获得所述电子设备10在全局子地图中的第二位姿数据,并将第二位姿数据发送给电子设备10。
图14所示的服务器20中的其他功能模块进一步可能包含的组件可类似参考前述图6实施例中对服务器20的相关描述,为了说明书的简洁,这里不再赘述。
图14所示的电子设备10中的各功能模块进一步可能包含的组件可类似参考前述图6实施例中对电子设备10的相关描述,为了说明书的简洁,这里也不再赘述。
具体实施例中,电子设备10中的各功能模块可相互配合,以执行如图15实施例中电子设备侧的功能。服务器20中的各功能模块可相互配合,以执行如图15实施例中服务器侧的功能。
参见图15,图15是本申请实施例提供的又一种虚拟物体显示方法的流程示意图,从电子设备和服务器侧分别进行描述。该方法包括但不限于以下步骤:
S501、电子设备向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息和至少一帧视频图像。
具体实施例中,为实现电子设备的第一次全局定位,电子设备需要向服务器上传位置指纹信息和至少一张或多张当前采集的视频图像。其中,所述至少一帧视频图像可以是电子设备通过摄像头拍摄的视频图像序列中的第K帧图像。比如,可以是摄像头拍摄的视频 图像序列中的第1帧图像。所述位置指纹信息所指示的初始位置可能是所述电子设备请求下载地图时电子设备所处的地理位置信息,例如位置指纹信息来源包括GNSS/WiFi/蓝牙/基站等定位方式所测得的初始位置信息、信号强度信息、信号特征信息等等;也可能是在由用户输入的位置信息。
在一种具体实现中,电子设备可以将第一位置指纹信息和一帧视频图像一起打包发送给服务器。
在又一种具体实现中,电子设备也可以将第一位置指纹信息和一帧视频图像分别独立发送给服务器。
S502、服务器从全局地图中,获取与第一位置指纹信息匹配的全局子地图。所述全局子地图是全局地图中与所述电子设备的位置对应的子地图。
一种实施例中,服务器根据上述第一位置指纹信息,与服务器预先保存在数据库中的子地图的位置指纹信息做匹配,所述数据库中的子地图为隶属于全局地图的子地图。如果有匹配上的子地图,则该子地图即为后续需要下发给电子设备的全局子地图。
又一种实施例中,服务器根据上述第一位置指纹信息,在服务器保存的全局地图中遍历,直到寻找到位置指纹信息匹配的区域,服务器从全局地图中取出该区域作为全局子地图。
S503、服务器根据该视频图像和全局子地图进行位姿计算,获得电子设备在全局子地图中的全局位姿(这里也可称为第二位姿数据)。
本申请的一种实施例中,第一次计算电子设备的全局位姿可以放在服务器端完成,服务器执行全局位姿计算的过程同样包括图像检索、特征提取、特征匹配、位姿估计等过程,
例如,服务器对该视频图像进行特征检测,在该视频图像中提取出特征的图像位置,特征检测算法不限于FAST、ORB、SIFT、SURF、D2Net、SuperPoint等特征检测方法。然后对每一个检测出的特征进行特征描述,特征描述算法不限于ORB、SIFT、SURF、BRIEF、BRISK、FREAK、D2Net、SuperPoint等特征描述方法,从而形成一个一维向量,用于后续的特征匹配。通过特征匹配,服务器可从全局子地图中匹配出与该视频图像最相似的地图内容(例如一帧或多帧关键帧),具体方法例如包括基于BOW、VLAD等传统图像检索方法以及基于NetVLAD、AI的新型图像检索方法。特征匹配具体是计算两个特征描述之间的相似程度,对于Float型向量,可以通过欧氏距离计算;对于二值向量,可以通过异或计算。找出与视频图像最相似的地图内容后,就可以基于该视频图像和最相似的地图内容进行位姿估计,例如可采用PnP、EPnP、3D-3D等配准算法,从而可以计算出该第二位姿数据。
S504、服务器在完成位姿估计之后,下发所述电子设备在全局子地图中的全局位姿(第二位姿数据)给电子设备。相应的,电子设备接收该第二位姿数据。电子设备后续可利用该第二位姿数据来计算坐标系变换信息(例如坐标系变换矩阵)。
S505、服务器下发全局子地图给电子设备。相应的,电子设备接收该全局子地图。
S506、电子设备将该全局子地图存储到所述电子设备的SLAM***中。
S507、电子设备计算坐标系变换信息,并基于坐标系变换信息对SLAM地图进行变换。
同样的,电子设备计算坐标系变换信息的过程可描述如下:电子设备获取摄像头采集的视频图像序列中的第K帧图像,所述第K帧图像和S501中所发送给服务器的视频图像为 相同图像;然后,一方面,根据该第K帧图像和SLAM地图,确定电子设备在原先所构建的局部坐标系下的SLAM地图中的局部位姿(这里可称为第一位姿数据)。具体实现过程可类似参考图12实施例的S305中有关于第一位姿数据的描述,为了说明书的简洁,这里不再赘述。
然后,电子设备可根据所述第一位姿数据和通过S504所获得的第二位姿数据,进一步获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息,坐标系变换信息例如可以是坐标系变换矩阵。关于坐标系变换矩阵的相关内容可类似参考图13实施例的相关描述,为了说明书的简洁,这里也不再赘述。
S508、电子设备根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以获得电子设备的全局位姿。
S509、电子设备基于电子设备的全局位姿,在显示组件上显示虚拟物体。
S510、电子设备基于电子设备的全局位姿,更新全局坐标系下的SLAM地图。
有关于S508-S510的相关内容可类似参考图12实施例的S306-S308的相关描述,为了说明书的简洁,这里也不再赘述。
可以看到,本申请实施例中,电子设备需要先下载对应区域的全局子地图,下载地图的过程需要花费一定时间。为了加快用户进入应用的速度,可以采用第一次全局位姿估计在服务器端完成的方式。也即是说,应用启动后,第一次全局位姿估计在服务器侧进行,在启动全局位姿估计的同时,服务器相应获取全局子地图并向电子设备传输,提高用户进入应用的速度。用户并不会感知到地图下载流程的时延存在,这样,能够避免了下载时延造成的用户等待,进一步提升用户体验。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者任意组合来实现。当使用软件实现时,可以全部或者部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令,在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网络站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、微波等)方式向另一个网络站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,也可以是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带等)、光介质(例如DVD等)、或者半导体介质(例如固态硬盘)等等。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。

Claims (39)

  1. 一种虚拟物体显示方法,应用于具有显示组件和摄像头的电子设备,其特征在于,所述方法包括:
    检测到用户打开应用的操作;
    响应于所述操作,下载全局子地图并存储到所述电子设备的即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)***中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
    在所述显示组件上显示所述虚拟物体的位置与姿态,所述虚拟物体的位置与姿态是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算得到的。
  2. 根据权利要求1所述的方法,其特征在于,
    所述电子设备的位姿数据用于表征所述虚拟物体的位置与姿态,所述电子设备的位姿数据是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图以第一频率执行位姿计算得到的。
  3. 根据权利要求1或2所述的方法,其特征在于,所述位姿计算的过程包括:
    根据所述摄像头采集的视频图像、所述全局子地图以及所述电子设备采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述响应于所述操作,下载全局子地图,包括:
    响应于所述操作,向服务器发送所述电子设备的初始位置的指示信息;
    从所述服务器接收所述全局子地图,所述全局子地图是根据所述电子设备的初始位置确定的。
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备的初始位置的指示信息包括用于指示所述电子设备的初始位置的第一位置指纹信息;所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
  6. 根据权利要求2-5任一项所述的方法,其特征在于,所述方法还包括:
    根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图之前,进一步包括:
    根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
    根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐 标系下的全局子地图中的第二位姿数据;
    根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
    根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;
    相应的,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图,包括:
    以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据,包括:
    根据所述第K帧图像、所述第一坐标系下的SLAM地图以及所述电子设备采集的运动数据,获得所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;所述运动数据包括运动速度数据和运动方向数据。
  9. 根据权利要求7或8所述的方法,其特征在于,所述根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据,包括:
    根据所述第K帧图像进行特征提取,以得到图像特征;
    将所述图像特征在所述第二坐标系下的全局子地图中进行特征匹配,以得到与所述图像特征匹配的地图特征;
    根据所述图像特征和所述地图特征,计算得到所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据。
  10. 根据权利要求7或8所述的方法,其特征在于,所述根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据,包括:
    向服务器发送所述第K帧图像;
    从所述服务器接收所述第二位姿数据,所述第二位姿数据是所述服务器根据所述第K帧图像和所述第二坐标系下的全局子地图进行特征提取和特征匹配确定的。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述在所述显示组件上显示所述虚拟物体的位置与姿态,包括:
    在所述显示组件上显示第一界面,在所述第一界面显示视频图像和虚拟物体;所述虚拟物体相对于所述视频图像的位置与姿态是基于所述电子设备的位姿数据来显示的,所述电子设备的位姿数据是至少根据所述摄像头采集的所述视频图像和所述全局子地图执行位姿计算过程得到的。
  12. 一种虚拟物体显示方法,应用于具有显示组件和摄像头的电子设备,其特征在于,所述方法包括:
    获取全局子地图并存储到所述电子设备的即时定位与地图构建(SLAM)***中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
    根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据;
    基于所述电子设备的位姿数据,在所述显示组件上显示所述虚拟物体。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据,包括:
    至少根据所述摄像头采集的视频图像和所述全局子地图以第一频率执行位姿计算,以得到所述电子设备的位姿数据。
  14. 根据权利要求12或13所述的方法,其特征在于,所述电子设备的位姿数据为第一坐标系下的所述电子设备的位姿数据,或者,第二坐标系下的所述电子设备的位姿数据;所述第一坐标系下是所述SLAM***的SLAM地图的坐标系,所述第二坐标系是所述全局子地图的坐标系。
  15. 根据权利要求12-14任一项所述的方法,其特征在于,所述根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据,包括:
    根据所述摄像头采集的视频图像、所述全局子地图以及所述电子设备采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
  16. 根据权利要求12-15任一项所述的方法,其特征在于,所述方法还包括:
    根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图。
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图之前,进一步包括:
    根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
    根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;
    根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
    根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;
    相应的,所述根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图,包括:
    以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
  18. 根据权利要求12-17任一项所述的方法,其特征在于,所述获取全局地图的全局子地图,包括:
    向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息;
    从所述服务器接收所述全局子地图,所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
  19. 根据权利要求12-18任一项所述的方法,其特征在于,所述虚拟物体为虚拟现实VR场景、增强现实AR场景、或混合现实MR场景中的虚拟物体。
  20. 一种用于虚拟物体显示的电子设备,其特征在于,包括:
    显示组件;摄像头;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:
    检测到用户打开应用的操作;
    响应于所述操作,下载全局子地图并存储到所述电子设备的即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)***中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
    在所述显示组件上显示所述虚拟物体的位置与姿态,所述虚拟物体的位置与姿态是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图执行位姿计算得到的。
  21. 根据权利要求20所述的电子设备,其特征在于,
    所述电子设备的位姿数据用于表征所述虚拟物体的位置与姿态,所述电子设备的位姿数据是所述SLAM***至少根据所述摄像头采集的视频图像和所述全局子地图以第一频率执行位姿计算得到的。
  22. 根据权利要求20或21所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    根据所述摄像头采集的视频图像、所述全局子地图以及所述电子设备采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
  23. 根据权利要求20-22任一项所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    响应于所述操作,向服务器发送所述电子设备的初始位置的指示信息;
    从所述服务器接收所述全局子地图,所述全局子地图是根据所述电子设备的初始位置确定的。
  24. 根据权利要求23所述的电子设备,其特征在于,所述电子设备的初始位置的指示信息包括用于指示所述电子设备的初始位置的第一位置指纹信息;所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
  25. 根据权利要求20-24任一项所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备还执行以下步骤:
    根据所述电子设备的位姿数据更新所述SLAM***的SLAM地图。
  26. 根据权利要求25所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    根据所述摄像头采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
    根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;
    根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
    根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;
    以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
  27. 根据权利要求26所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    根据所述第K帧图像、所述第一坐标系下的SLAM地图以及所述电子设备采集的运动数据,获得所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;所述运动数据包括运动速度数据和运动方向数据。
  28. 根据权利要求26或27所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    根据所述第K帧图像进行特征提取,获得图像特征;
    将所述图像特征在所述第二坐标系下的全局子地图中进行特征匹配,获得与所述图像特征匹配的地图特征;
    根据所述图像特征和所述地图特征,计算获得所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据。
  29. 根据权利要求26或27所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    向服务器发送所述第K帧图像;
    从所述服务器接收所述第二位姿数据,所述第二位姿数据是所述服务器根据所述第K帧图像和所述第二坐标系下的全局子地图进行特征提取和特征匹配确定的。
  30. 根据权利要求20-29任一项所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    在所述显示组件上显示第一界面,在所述第一界面显示视频流和虚拟物体;所述虚拟物体相对于所述视频流的位置与姿态是基于所述电子设备的位姿数据来显示的,所述电子设备的位姿数据是至少根据所述摄像头采集的所述视频图像和所述全局子地图执行位姿计算过程得到的。
  31. 一种用于虚拟物体显示的电子设备,其特征在于,包括:
    通信模块,用于获取全局子地图并存储到所述电子设备的即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)模块中;所述全局子地图是全局地图中与所述电子设备的位置对应的子地图;
    SLAM模块,用于根据所述数据采集模块采集的视频图像和所述全局子地图执行位姿计算,以得到所述电子设备的位姿数据;
    交互模块,用于基于所述电子设备的位姿数据,在所述显示组件上显示所述虚拟物体。
  32. 根据权利要求31所述的电子设备,其特征在于,所述SLAM模块具体用于:
    至少根据所述数据采集模块采集的视频图像和所述全局子地图以第一频率执行位姿计算,以得到所述电子设备的位姿数据。
  33. 根据权利要求31或32所述的电子设备,其特征在于,所述电子设备的位姿数据为第一坐标系下的所述电子设备的位姿数据,或者,第二坐标系下的所述电子设备的位姿数据;所述第一坐标系下是所述SLAM模块的SLAM地图的坐标系,所述第二坐标系是所述全局子地图的坐标系。
  34. 根据权利要求31-33任一项所述的电子设备,其特征在于,所述SLAM模块具体用于:
    根据所述数据采集模块采集的视频图像、所述全局子地图以及所述数据采集模块采集的运动数据执行位姿计算,以得到所述电子设备的位姿数据,所述运动数据包括运动速度数据和运动方向数据。
  35. 根据权利要求31-34任一项所述的电子设备,其特征在于,所述SLAM模块还用于,根据所述电子设备的位姿数据更新所述SLAM模块的SLAM地图。
  36. 根据权利要求35所述的电子设备,其特征在于,所述电子设备还包括全局定位模块和坐标系变化矩阵计算模块;
    所述SLAM模块具体用于,根据所述数据采集模块采集的视频图像中的第K帧图像和第一坐标系下的SLAM地图,确定所述电子设备在所述第一坐标系下的SLAM地图中的第一位姿数据;K为大于等于1的整数;
    所述全局定位模块具体用于,根据所述第K帧图像和第二坐标系下的全局子地图,确定所述电子设备在所述第二坐标系下的全局子地图中的第二位姿数据;
    所述坐标系变化矩阵计算模块具体用于,通过所述坐标系变化矩阵计算模块根据所述第一位姿数据和所述第二位姿数据,获得所述SLAM地图的第一坐标系和所述全局地图的第二坐标系之间的坐标系变换信息;
    所述SLAM模块还用于,根据所述坐标系变换信息,将所述第一坐标系下的SLAM地图变换成所述第二坐标系下的SLAM地图;以所述电子设备的位姿数据作为在所述第二坐标系下的SLAM地图中的位姿数据更新所述第二坐标系下的SLAM地图。
  37. 根据权利要求31-36任一项所述的电子设备,其特征在于,所述通信模块还用于:
    向服务器发送用于指示所述电子设备的初始位置的第一位置指纹信息;从所述服务器接收所述全局子地图,所述全局子地图对应有第二位置指纹信息,且所述第一位置指纹信息和所述第二位置指纹信息相匹配。
  38. 根据权利要求31-37任一项所述的电子设备,其特征在于,所述虚拟物体为虚拟现实VR场景、增强现实AR场景、或混合现实MR场景中的虚拟物体。
  39. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1至11中任一项所述的虚拟物体显示方法或者执行如权利要求12至19中任一项所述的虚拟物体显示方法。
PCT/CN2020/113340 2019-11-08 2020-09-03 虚拟物体显示方法以及电子设备 WO2021088498A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20885155.0A EP4030391A4 (en) 2019-11-08 2020-09-03 VIRTUAL OBJECT DISPLAY METHOD AND ELECTRONIC DEVICE
US17/718,734 US11776151B2 (en) 2019-11-08 2022-04-12 Method for displaying virtual object and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911092326.0 2019-11-08
CN201911092326.0A CN112785715B (zh) 2019-11-08 虚拟物体显示方法以及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/718,734 Continuation US11776151B2 (en) 2019-11-08 2022-04-12 Method for displaying virtual object and electronic device

Publications (1)

Publication Number Publication Date
WO2021088498A1 true WO2021088498A1 (zh) 2021-05-14

Family

ID=75749468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113340 WO2021088498A1 (zh) 2019-11-08 2020-09-03 虚拟物体显示方法以及电子设备

Country Status (3)

Country Link
US (1) US11776151B2 (zh)
EP (1) EP4030391A4 (zh)
WO (1) WO2021088498A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI809538B (zh) * 2021-10-22 2023-07-21 國立臺北科技大學 結合擴增實境之清消軌跡定位系統及其方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111674800B (zh) * 2020-06-03 2021-07-09 灵动科技(北京)有限公司 用于自动驾驶***的智能仓储技术
US20220148268A1 (en) * 2020-11-10 2022-05-12 Noderix Teknoloji Sanayi Ticaret Anonim Sirketi Systems and methods for personalized and interactive extended reality experiences
US11933621B2 (en) * 2021-10-06 2024-03-19 Qualcomm Incorporated Providing a location of an object of interest
CN115439625B (zh) * 2022-11-08 2023-02-03 成都云中楼阁科技有限公司 建筑草图辅助绘制方法、装置、存储介质及绘制设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170336511A1 (en) * 2016-05-18 2017-11-23 Google Inc. System and method for concurrent odometry and mapping
CN108629843A (zh) * 2017-03-24 2018-10-09 成都理想境界科技有限公司 一种实现增强现实的方法及设备
CN108765563A (zh) * 2018-05-31 2018-11-06 北京百度网讯科技有限公司 基于ar的slam算法的处理方法、装置及设备
CN109636916A (zh) * 2018-07-17 2019-04-16 北京理工大学 一种动态标定的大范围虚拟现实漫游***及方法
CN109949422A (zh) * 2018-10-15 2019-06-28 华为技术有限公司 用于虚拟场景的数据处理方法以及设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL208600A (en) * 2010-10-10 2016-07-31 Rafael Advanced Defense Systems Ltd Real-time network-based laminated reality for mobile devices
US20120306850A1 (en) * 2011-06-02 2012-12-06 Microsoft Corporation Distributed asynchronous localization and mapping for augmented reality
US20140323148A1 (en) * 2013-04-30 2014-10-30 Qualcomm Incorporated Wide area localization from slam maps
US20150371440A1 (en) * 2014-06-19 2015-12-24 Qualcomm Incorporated Zero-baseline 3d map initialization
GB201621903D0 (en) 2016-12-21 2017-02-01 Blue Vision Labs Uk Ltd Localisation
US10659768B2 (en) 2017-02-28 2020-05-19 Mitsubishi Electric Research Laboratories, Inc. System and method for virtually-augmented visual simultaneous localization and mapping
GB201705767D0 (en) * 2017-04-10 2017-05-24 Blue Vision Labs Uk Ltd Co-localisation
CN109671118B (zh) 2018-11-02 2021-05-28 北京盈迪曼德科技有限公司 一种虚拟现实多人交互方法、装置及***
CN109767499A (zh) 2018-12-29 2019-05-17 北京诺亦腾科技有限公司 基于mr设备的多用户沉浸式交互方法、***及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170336511A1 (en) * 2016-05-18 2017-11-23 Google Inc. System and method for concurrent odometry and mapping
CN108629843A (zh) * 2017-03-24 2018-10-09 成都理想境界科技有限公司 一种实现增强现实的方法及设备
CN108765563A (zh) * 2018-05-31 2018-11-06 北京百度网讯科技有限公司 基于ar的slam算法的处理方法、装置及设备
CN109636916A (zh) * 2018-07-17 2019-04-16 北京理工大学 一种动态标定的大范围虚拟现实漫游***及方法
CN109949422A (zh) * 2018-10-15 2019-06-28 华为技术有限公司 用于虚拟场景的数据处理方法以及设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI809538B (zh) * 2021-10-22 2023-07-21 國立臺北科技大學 結合擴增實境之清消軌跡定位系統及其方法

Also Published As

Publication number Publication date
CN112785715A (zh) 2021-05-11
US11776151B2 (en) 2023-10-03
US20220237816A1 (en) 2022-07-28
EP4030391A4 (en) 2022-11-16
EP4030391A1 (en) 2022-07-20

Similar Documents

Publication Publication Date Title
WO2021088498A1 (zh) 虚拟物体显示方法以及电子设备
US11158083B2 (en) Position and attitude determining method and apparatus, smart device, and storage medium
WO2019223468A1 (zh) 相机姿态追踪方法、装置、设备及***
US20230239567A1 (en) Wearable Multimedia Device and Cloud Computing Platform with Application Ecosystem
US11380012B2 (en) Method and apparatus for visual positioning based on mobile edge computing
CN110986930B (zh) 设备定位方法、装置、电子设备及存储介质
CN111026314B (zh) 控制显示设备的方法及便携设备
JP7487321B2 (ja) 測位方法及びその装置、電子機器、記憶媒体、コンピュータプログラム製品、コンピュータプログラム
WO2021088497A1 (zh) 虚拟物体显示方法、全局地图更新方法以及设备
CN114332423A (zh) 虚拟现实手柄追踪方法、终端及计算可读存储介质
CN114092655A (zh) 构建地图的方法、装置、设备及存储介质
WO2019134305A1 (zh) 确定姿态的方法、装置、智能设备、存储介质和程序产品
CN111928861B (zh) 地图构建方法及装置
CN112835021B (zh) 定位方法、装置、***及计算机可读存储介质
CN112785715B (zh) 虚拟物体显示方法以及电子设备
WO2022252337A1 (zh) 3d地图的编解码方法及装置
CN116152075A (zh) 光照估计方法、装置以及***
WO2019233299A1 (zh) 地图构建方法、装置及计算机可读存储介质
WO2022252237A1 (zh) 3d地图的编解码方法及装置
US20230418072A1 (en) Positioning method, apparatus, electronic device, head-mounted display device, and storage medium
CN111798358B (zh) 确定算路时间的方法、装置、电子设备及可读存储介质
WO2022252236A1 (zh) 3d地图的编解码方法及装置
WO2024057779A1 (ja) 情報処理装置、プログラムおよび情報処理システム
CN117635697A (zh) 位姿确定方法、装置、设备、存储介质和程序产品
CN116502382A (zh) 传感器数据的处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20885155

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020885155

Country of ref document: EP

Effective date: 20220411

NENP Non-entry into the national phase

Ref country code: DE