CN117115244A

CN117115244A - Cloud repositioning method, device and storage medium

Info

Publication number: CN117115244A
Application number: CN202210519292.4A
Authority: CN
Inventors: 李永
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2023-11-24

Abstract

The present disclosure relates to a cloud repositioning method, a cloud repositioning device and a storage medium, wherein repositioning request data sent by a terminal can be received through a server, and the repositioning request data comprises a real-time image acquired by the terminal and a first camera parameter corresponding to the real-time image; performing feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features; respectively performing feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map; and repositioning a first pose of the terminal in the AR point cloud map according to the local image features, the matching image and the first camera parameters.

Description

Cloud repositioning method, device and storage medium

Technical Field

The disclosure relates to the technical field of cloud vision repositioning of an AR, in particular to a cloud repositioning method, device and storage medium.

Background

At present, the AR (Augmented Reality ) cloud is the hottest technical direction in the AR field, and is one of the decisive technologies for determining whether AR equipment (such as an AR mobile phone or AR glasses) can be mature, technical companies such as Google, niantic, apple and the like have issued SDKs (Software Development Kit, software development kits) or products of the AR cloud, domestic scientific research teams have issued related products of the AR cloud, and cloud visual repositioning technology based on the AR cloud can be applied to various AR application scenes, such as lasting display of AR virtual content and large-scale tracking of terminals based on the visual repositioning technology of the AR cloud.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a cloud repositioning method, a cloud repositioning device and a storage medium.

According to a first aspect of an embodiment of the present disclosure, there is provided a cloud relocation method, applied to a server, the method including:

receiving repositioning request data sent by a terminal, wherein the repositioning request data comprises a real-time image acquired by the terminal and a first camera parameter corresponding to the real-time image;

performing feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features;

respectively performing feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map;

and repositioning a first pose of the terminal in the AR point cloud map according to the local image features, the matching image and the first camera parameters.

Optionally, the first pre-trained deep learning model comprises a first model and a second model; the step of extracting features of the real-time image through the first pre-training deep learning model to obtain global image features and local image features comprises the following steps:

And extracting features of the real-time image through the first model to obtain the global image features, and extracting features of the real-time image through the second model to obtain the local image features.

Optionally, the performing feature matching on the real-time image and the plurality of preset images according to the global image feature, to obtain a matched image includes:

and outputting the matched images which are consistent with the global image features of the real-time image in the preset images through a K-nearest neighbor algorithm according to the global image features.

Optionally, repositioning the first pose of the terminal in the AR point cloud map according to the local image feature, the matching image, and the first camera parameter includes:

registering the real-time image and each matching image according to the local image features to obtain a first feature point pair with successful registration;

and repositioning the first pose of the terminal in the AR point cloud map according to the first characteristic point pair and the first camera parameter.

Optionally, the local image feature includes a plurality of feature points, and the registering the real-time image with each of the matching images according to the local image feature, to obtain a first feature point pair with successful registration includes:

And inputting a plurality of feature points corresponding to the real-time image and a plurality of feature points corresponding to the matching image into a second pre-training deep learning model aiming at each matching image to obtain the first feature point pair which is successfully registered.

Optionally, the first feature point pair includes a first feature point of the real-time image and a second feature point of the matching image; the repositioning the first pose of the terminal in the AR point cloud map according to the first feature point pair and the first camera parameter includes:

for each first feature point pair, determining a first 3D point of a first feature point in the AR point cloud map according to a second feature point in the first feature point pair;

and calculating a first pose of the terminal in the AR point cloud map through a preset pose algorithm according to a plurality of first feature point pairs, the first 3D points corresponding to each first feature point and the first camera parameters.

Optionally, after repositioning the first pose of the terminal in the AR point cloud map according to the local image feature, the matching image, and the first camera parameter, the method further includes:

And sending the first pose to the terminal so that the terminal can determine the mapping relation between the AR point cloud map and the local world coordinate system of the terminal according to the first pose and the second pose, wherein the second pose is the pose of the terminal relative to the local world coordinate system, which is calculated by the terminal through an instant positioning and map construction SLAM tracking algorithm.

Optionally, the AR point cloud map is pre-created by:

receiving environment data sent by the terminal, wherein the environment data comprises multi-frame environment images of the current environment acquired by the terminal and image parameters corresponding to each frame of environment image respectively;

for each frame of environment image, extracting global features and local features corresponding to the frame of environment image by adopting the first pre-training deep learning model, and determining a preset number of similar images corresponding to the frame of environment image from other images according to the global features corresponding to the frame of environment image;

for each frame of similar image, carrying out image registration on the frame of the environmental image and the frame of similar image according to the local features of the frame of the environmental image and the local features of the frame of similar image to obtain a registration result, wherein the registration result comprises a plurality of second feature point pairs which are successfully registered;

Determining second 3D points corresponding to each second feature point respectively through a preset depth estimation algorithm (such as a triangulation algorithm, a depth filtering algorithm and the like) according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image;

and generating the AR point cloud map according to the second 3D points.

Optionally, the image parameters include camera shooting parameters and terminal pose of the terminal relative to a local world coordinate system when the frame of environment image is acquired; the determining, according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image, the second 3D point corresponding to each second feature point pair through a preset depth estimation model includes:

according to the terminal pose corresponding to the frame environment image and the terminal pose corresponding to the frame similar image, calculating to obtain a first pose conversion matrix between cameras corresponding to the two images respectively;

and inputting a plurality of second characteristic point pairs, camera shooting parameters of the frame environment image, camera shooting parameters of the frame similar image and the first pose conversion matrix into the preset depth estimation model to obtain the second 3D points corresponding to each second characteristic point pair respectively.

Optionally, after determining, according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image through a preset depth estimation model, each second feature point pair corresponds to a second 3D point respectively, the method further includes:

re-projecting the second 3D points to a designated image aiming at each second 3D point to obtain projection points, wherein the designated image comprises the frame environment image or the frame similar image;

determining whether the second 3D point meets a preset error threshold condition according to the projection point and the initial characteristic point corresponding to the second 3D point in the designated image;

deleting the second 3D point under the condition that the second 3D point does not meet the preset error threshold condition, so as to screen the second 3D point to obtain a target 3D point;

the generating the AR point cloud map from the second 3D point includes:

and generating the AR point cloud map according to the target 3D point.

Optionally, the method further comprises:

for each second 3D point, carrying out parameter optimization on the second 3D point through a preset parameter optimization algorithm to obtain optimized 3D points;

and carrying out position adjustment on the AR point cloud map according to the optimized 3D points.

Optionally, after the generating the AR point cloud map according to the second 3D point, the method further includes:

determining whether each two point cloud maps correspond to a public picture according to each two point cloud maps in the plurality of AR point cloud maps; the public picture is a picture representing a repeated public area between the two point cloud maps;

and under the condition that the two point cloud maps correspond to the public picture, merging the two point cloud maps to obtain a target point cloud map.

Optionally, the two point cloud maps include a first point cloud map and a second point cloud map, the common picture includes a plurality of pictures for creating the first point cloud map, and the merging the two point cloud maps to obtain the target point cloud map includes:

aiming at each public picture, positioning a camera for acquiring the public picture at a third pose of the second point cloud map;

determining a second pose conversion matrix between the first point cloud map and the second point cloud map according to the third pose and a fourth pose of a camera for acquiring the public picture relative to the first point cloud map;

and merging the two point cloud maps according to the second pose conversion matrix to obtain a target point cloud map.

performing densification processing on the AR point cloud map to obtain a dense point cloud map;

and obtaining object parameters corresponding to a preset AR object, and adding the preset AR object into the dense point cloud map according to the object parameters.

receiving a target image sent by the terminal and a second camera parameter when the terminal collects the target image, wherein the target image is an image to which a target AR object needs to be added;

repositioning the pose of the terminal relative to the AR point cloud map according to the target image and the second camera parameter to obtain a repositioning pose;

sending the repositioning pose to the terminal, so that the terminal determines an eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose under the condition that the seventh pose of the target AR object relative to a local world coordinate system of the terminal is acquired, wherein the target AR object is an AR object added by a user on the target image;

And receiving the target AR object and the eighth pose sent by the terminal, and adding the target AR object in the AR point cloud map according to the eighth pose.

According to a second aspect of the embodiments of the present disclosure, there is provided a cloud relocation method, applied to a terminal, the method including:

acquiring a real-time image and acquiring a first camera parameter when the real-time image is acquired;

transmitting repositioning request data to a server, wherein the repositioning request data comprises the real-time image and the first camera parameters, so that the server performs feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features, respectively performs feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, and repositioning a first pose of the terminal in the AR point cloud map according to the local image features, the matched image and the first camera parameters;

and receiving a repositioning result sent by the server, wherein the repositioning result comprises the first pose.

Optionally, the method further comprises:

Determining a mapping relation between the AR point cloud map and a local world coordinate system of the terminal according to the first pose and the second pose, wherein the second pose is a pose of the terminal relative to the local world coordinate system, which is calculated by the terminal through a SLAM tracking algorithm;

receiving a fifth pose of the preset AR object, which is sent by the server, relative to the AR point cloud map;

determining a sixth pose of the preset AR object relative to the local world coordinate system according to the mapping relation and the fifth pose;

and rendering the preset AR object on the real-time image according to the sixth pose so as to add the preset AR object on the real-time image.

Optionally, the repositioning result further includes a reprojection constraint relationship, where the reprojection constraint relationship is a corresponding relationship between a target feature point of the real-time image and a 3D point corresponding to the target feature point on the AR point cloud map; the re-projection constraint relation comprises a plurality of target 3D points and target feature points corresponding to the target 3D points respectively; the method further comprises the steps of:

re-projecting the target 3D point to the real-time image through a preset re-projection algorithm aiming at each target 3D point to obtain a re-projection point corresponding to the target 3D point on the real-time image;

Determining a residual equation according to the target characteristic point and the re-projection point corresponding to the target 3D point;

and taking the residual equation as a residual constraint term of the SLAM tracking algorithm to perform parameter optimization on the camera pose calculated by the SLAM tracking algorithm, so as to obtain the optimized camera pose.

Optionally, the method further comprises:

sending a target image and a second camera parameter when the terminal acquires the target image to a server, wherein the target image is an image to which a target AR object needs to be added;

receiving a repositioning pose of the terminal in the AR point cloud map, which is sent by the server and is determined by the server according to the target image and the second camera parameter;

acquiring a seventh pose of the target AR object added by a user on the target image relative to a local world coordinate system of the terminal;

determining an eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose;

and sending the target AR object and the eighth pose to the server, so that the server adds the target AR object in the AR point cloud map according to the eighth pose.

According to a third aspect of embodiments of the present disclosure, there is provided a cloud relocation apparatus, applied to a server, the apparatus including:

the terminal comprises a first receiving module and a second receiving module, wherein the first receiving module is configured to receive relocation request data sent by a terminal, and the relocation request data comprises a real-time image acquired by the terminal and a first camera parameter corresponding to the real-time image;

the feature extraction module is configured to extract features of the real-time image through a first pre-training deep learning model to obtain global image features and local image features;

the image matching module is configured to respectively perform feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map;

and the repositioning module is configured to reposition the first pose of the terminal in the AR point cloud map according to the local image characteristics, the matching image and the first camera parameters.

According to a fourth aspect of embodiments of the present disclosure, there is provided a cloud repositioning device, applied to a terminal, the device including:

the data acquisition module is configured to acquire a real-time image and acquire a first camera parameter when the real-time image is acquired;

The second sending module is configured to send repositioning request data to a server, wherein the repositioning request data comprises the real-time image and the first camera parameters, so that the server performs feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features, performs feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, and repositioning a first pose of the terminal in the AR point cloud map according to the local image features, the matched image and the first camera parameters;

and the second receiving module is configured to receive a repositioning result sent by the server, wherein the repositioning result comprises the first pose.

According to a fifth aspect of embodiments of the present disclosure, there is provided a cloud relocation apparatus, applied to a server, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

According to a sixth aspect of the embodiments of the present disclosure, there is provided a cloud repositioning device, applied to a terminal, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the cloud relocation method provided by the first aspect of the present disclosure.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the cloud relocation method provided by the second aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the method comprises the steps that a server receives relocation request data sent by a terminal, wherein the relocation request data comprise a real-time image collected by the terminal and a first camera parameter corresponding to the real-time image; performing feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features; respectively performing feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map; according to the local image features, the matching images and the first camera parameters, the first pose of the terminal in the AR point cloud map is repositioned, so that compared with the method for directly performing image registration in a plurality of preset images based on local artificial feature points, the method for performing image screening from the plurality of preset images according to the global image features of the real-time images to obtain the matching images, and then performing image registration from the matching images according to the local image features, the repositioning result is higher in accuracy, the repositioning efficiency is relatively higher, and further, the global image features and the local image features in the method are extracted through a pre-trained deep learning model, and compared with the method for performing image registration by using the artificial feature points, the repositioning result is better in stability and accuracy.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow chart illustrating a cloud relocation method according to an exemplary embodiment.

Fig. 2 is a flowchart of a cloud relocation method according to the embodiment shown in fig. 1.

Fig. 3 is a flowchart illustrating a method of creating an AR point cloud map according to an example embodiment.

Fig. 4 is a schematic diagram illustrating a process of creating an AR point cloud map according to an example embodiment.

Fig. 5 is a flowchart illustrating a method of creating an AR point cloud map according to the embodiment shown in fig. 3.

Fig. 6 is a flowchart illustrating a method of creating an AR point cloud map according to the embodiment shown in fig. 3.

Fig. 7 is a flow chart illustrating a cloud relocation method according to an example embodiment.

FIG. 8 is a flow chart illustrating a method of AR persistence display in accordance with the embodiment illustrated in FIG. 7.

FIG. 9 is a process diagram illustrating an AR persistence display in accordance with an exemplary embodiment.

FIG. 10 is a flow chart illustrating a method for implementing large-scale tracking based on relocation of an AR point cloud map according to the embodiment shown in FIG. 7.

FIG. 11 is a schematic diagram illustrating a process for implementing extensive tracking based on relocation of an AR point cloud map, according to an example embodiment.

Fig. 12 is a block diagram illustrating a cloud relocation apparatus according to an example embodiment.

Fig. 13 is a block diagram of a cloud relocation apparatus according to the embodiment shown in fig. 12.

Fig. 14 is a block diagram illustrating a cloud relocation apparatus according to an example embodiment.

Fig. 15 is a block diagram of a cloud relocation apparatus according to the embodiment shown in fig. 14.

Fig. 16 is a block diagram of a cloud relocation apparatus according to the embodiment shown in fig. 14.

Fig. 17 is a block diagram of a cloud relocation apparatus according to the embodiment shown in fig. 14.

Fig. 18 is a block diagram illustrating an apparatus for cloud relocation according to an example embodiment.

Fig. 19 is a block diagram illustrating an apparatus for cloud relocation according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, all actions of acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

The present disclosure relates generally to a cloud visual repositioning technology of augmented reality AR, and the cloud visual repositioning technology based on AR cloud may be applied to a plurality of AR application scenarios, such as persistent display of AR virtual content and large-scale tracking of a terminal by the visual repositioning technology based on AR cloud. The cloud vision repositioning technology can also be applied to the field of robots, when the robots perform autonomous navigation, the robots need to know initial positions in an initialization stage, the initial positions can be provided through cloud vision repositioning, and when the tracking range is particularly large, global tracking errors can be effectively reduced through cloud vision repositioning, and the autonomous navigation precision of the robots is improved; in addition, the repositioning technology based on the cloud high-precision map can also provide high-precision real-time pose for the automatic driving vehicle in real time.

In the related art, in the process of performing visual repositioning based on a pre-created AR cloud map, local artificial feature points (such as SIFT feature points and ORB feature points) of a query image are often used for directly registering with an image corresponding to the AR point cloud map, but for a weak illumination or weak texture image, repositioning accuracy is not high when the artificial feature points are directly registered, and repositioning stability is poor.

In addition, in the related art, in the process of creating the AR cloud map, special devices (such as radar and panoramic camera) are required to collect image data for creating the cloud map, so that convenience is not enough and cost is high.

In order to solve the problems, the present disclosure provides a cloud repositioning method, a device and a storage medium, after receiving repositioning request data sent by a terminal, a server may perform feature extraction on a real-time image in repositioning request data through a deep learning model to obtain a global image feature and a local image feature, perform image screening from a plurality of preset images according to the global image feature of the real-time image to obtain a matching image, and then perform image registration from the matching image according to the local image feature, thereby implementing repositioning of a first pose of the terminal in an AR point cloud map.

In addition, when the AR point cloud map is created, the local terminal (such as an AR mobile phone, AR glasses and the like) can be directly used for collecting the environment image of the terminal at present and sending the environment image to the server, so that the server creates the AR point cloud map according to the environment image, and no additional image collection equipment is needed.

The following detailed description of specific embodiments of the present disclosure refers to the accompanying drawings.

Fig. 1 is a flowchart of a cloud relocation method according to an exemplary embodiment, applied to a server, as shown in fig. 1, the method includes the following steps:

in step S101, relocation request data sent by a terminal is received, where the relocation request data includes a real-time image collected by the terminal and a first camera parameter corresponding to the real-time image.

The terminal refers to a terminal with an AR function, such as an AR mobile phone, an AR glasses and the like.

The local AR terminal tracks the local environment in real time by running a SLAM (simultaneous localization and mapping) tracking system, and establishes a local world coordinate system at the time of initialization, captures a real-time image of the surrounding environment by a camera of the AR terminal, and then transmits the real-time image to a server, the terminal also records an in-camera reference including a focal length and a position of a light center when the camera captures each real-time image and a camera pose of the terminal (or the camera) with respect to the local world coordinate system when the camera captures each real-time image, the camera pose (or the so-called terminal pose) refers to a 6-degree-of-freedom pose of the terminal in the local world coordinate system calculated by the SLAM tracking system, the 6-degree-of-freedom pose includes a spatial position and a rotation amount of the terminal in the local world coordinate system, the rotation amount includes, for example, a roll angle, a pitch angle pi, and a yaw angle yaw, and the camera reference and the camera pose when the camera captures the real-time image to the server, the in-camera reference and the camera pose also includes the first parameter and the camera pose.

It should be noted that, the terminal may send the relocation request data to the server at intervals of a preset time.

In step S102, feature extraction is performed on the real-time image through a first pre-trained deep learning model, so as to obtain global image features and local image features.

The first pre-training deep learning model includes a first model and a second model, where the first model is used to extract global image features of the real-time image, the global image features (or referred to as global feature descriptors) refer to global feature description vectors of the whole image, the first model may be, for example, any one of two deep learning models NetVLad, SOLAR, the second model is used to extract local image features of the real-time image, the local image features include a plurality of feature points, each feature point includes a key point and a local feature descriptor, for example, may include a SuperPoint feature point, and the second model may include, for example, a repoatabiy model.

That is, in this step, the feature extraction may be performed on the real-time image through the first model to obtain the global image feature, and the feature extraction may be performed on the real-time image through the second model to obtain the local image feature.

In step S103, the real-time image and a plurality of preset images are respectively subjected to feature matching according to the global image features to obtain a matching image, wherein the preset image is an image for creating an augmented reality AR point cloud map.

The preset image is an environment image acquired through the AR terminal and used for creating the AR point cloud map in advance.

In this step, a plurality of matching images consistent with the global image features of the real-time image in a plurality of preset images may be output through a K-nearest neighbor algorithm according to the global image features.

In step S104, repositioning the terminal in the first pose in the AR point cloud map according to the local image feature, the matching image, and the first camera parameter.

In the step, for each matching image, the real-time image and the matching image can be registered according to the local image characteristics, so as to obtain a first characteristic point pair with successful registration; and repositioning the first pose of the terminal in the AR point cloud map according to the first characteristic point pair and the first camera parameter.

The local image features comprise a plurality of feature points, and the first pose is similar to the camera pose and comprises a 6-degree-of-freedom pose of the terminal in the AR point cloud map.

Here, the real-time image and each of the matching images may be registered according to the local image features, so as to obtain a first feature point pair with successful registration:

and inputting a plurality of feature points corresponding to the real-time image and a plurality of feature points corresponding to the matching image into a second pre-training deep learning model aiming at each matching image to obtain the first feature point pair with successful registration.

The second pre-training deep learning model may be, for example, a superstate model, the first feature point pairs may include a plurality of first feature points, each first feature point pair includes a first feature point of the real-time image and a second feature point of the matching image, and the first feature point and the second feature point belong to a one-to-one correspondence.

In the actual registration process, key points and local feature descriptors corresponding to each feature point of the real-time image and the matched image can be input into the second pre-training deep learning model to obtain the first feature point pair with successful registration.

The first feature point pair includes a first feature point of the real-time image and a second feature point of the matching image, so that in repositioning the terminal in a first pose in the AR point cloud map according to the first feature point pair and the first camera parameter, a first 3D point of a first feature point in the first feature point pair in the AR point cloud map may be determined for each of the first feature point pairs according to the second feature point in the first feature point pair; and calculating a first pose of the terminal in the AR point cloud map through a preset pose algorithm according to a plurality of first feature point pairs, the first 3D point corresponding to each first feature point and the first camera parameter.

The preset pose algorithm may include a PNP (transparent-n-Point) algorithm, for example, and the first pose may be represented by a pose conversion matrix of a camera coordinate system corresponding to the real-time image and a Point cloud map coordinate system (i.e., an anchor coordinate system).

The pre-constructed AR point cloud map corresponds to a plurality of data files, wherein a corresponding relation between each 3D point in the AR point cloud map and a 2D characteristic point (namely, a characteristic point of a local image characteristic in a preset image) on the preset image is stored in one data file, so that the server can directly read the 3D point in the AR point cloud map corresponding to each second characteristic point from the data file.

For example, in order to facilitate description, the real-time image is denoted as A1, it is assumed that after step S103 is performed, the real-time image A1 and a plurality of preset images are respectively subjected to feature matching according to the global image feature of the real-time image A1 to obtain three matching images B1, B2 and B3, taking the local image feature registration process of the two images A1 and B1 as an example, it is assumed that the real-time image A1 includes 100 feature points, the matching image B1 includes 120 feature points, the 100 feature points and the 120 feature points are input into a superfue model to obtain 50 feature point pairs, each feature point pair includes two feature points (i.e., a first feature point and a second feature point) corresponding to each other, for each feature point pair, a corresponding first 3D point of the first feature point pair in the AR point cloud can be read by querying a data file corresponding to the AR point cloud map, and further it can be determined that a 3D point in the AR point cloud corresponding to the first feature point of the first feature point pair is also the first 3D point of the first feature point in the AR point cloud, and thus the 3D point in the AR point cloud is the first 3D point pair, namely the first 3D point pair is a real-time 3D point, and the 3D point is a first feature point in the first point pair of the point pair; then, a plurality of first feature point pairs, the first 3D point corresponding to each first feature point and the first camera parameter may be used as input of a PNP algorithm, and a first pose of the terminal in the AR point cloud map may be calculated based on the PNP algorithm.

According to the method, after the server receives the repositioning request data sent by the terminal, the feature extraction can be performed on the real-time image in the repositioning request data through the deep learning model, global image features and local image features are obtained, image screening is performed on a plurality of preset images according to the global image features of the real-time image, the matched image is obtained, then image registration is performed on the matched image according to the local image features, and repositioning of the first pose of the terminal in the AR point cloud map is achieved.

The foregoing has mentioned that, in the actual AR application scenario, the persistent display may be performed on the AR virtual content based on the repositioning result of the terminal in the AR point cloud map, where the persistent display refers to virtual content placed by the user at a certain moment, visual repositioning can be implemented at any time later, and after the position of the virtual content is accurately restored, the AR virtual content is redisplayed, where the AR virtual content includes preset multimedia files such as text, picture, video, 3D model, 2D/3D special effects, and the like, and in order to implement the persistent display of the AR virtual content, the server needs to send the first pose to the terminal, so that the terminal performs the persistent display of the AR virtual content according to the first pose.

Accordingly, fig. 2 is a flowchart of a cloud relocation method according to the embodiment shown in fig. 1, and as shown in fig. 2, the method further includes the steps of:

in step S105, the first pose is sent to the terminal, so that the terminal determines a mapping relationship between the AR point cloud map and a local world coordinate system of the terminal according to the first pose and the second pose, where the second pose is a pose of the terminal, calculated by the terminal through an instant positioning and mapping SLAM tracking algorithm, relative to the local world coordinate system.

After determining the mapping relation, the terminal may further calculate a sixth pose of the preset AR object with respect to the local world coordinate system according to the mapping relation and a fifth pose of the preset AR object with respect to the AR point cloud map, where the preset AR object is sent by the server, and then render the preset AR object on the real-time image according to the sixth pose, so as to add the preset AR object on the real-time image, thereby implementing persistent display of the preset AR object.

The specific implementation manner of the terminal for performing the persistent display on the preset AR object at the terminal side according to the first pose, the second pose and the fifth pose of the preset AR object sent by the server with respect to the AR point cloud map will be described in detail in the embodiment of the terminal, which is not limited herein.

In addition, the above mentioned real AR application scenario may further implement a large-scale tracking of the terminal based on the visual repositioning technology of the AR cloud, where the large-scale tracking refers to that when a user moves in a large-scale (generally, an area exceeding one thousand square meters) scenario, positioning with centimeter-level precision can still be implemented, and an accurate AR virtual-real superposition effect is implemented in the whole course.

As described above, in the process of repositioning the first pose of the terminal on the AR point cloud map by the server, a set of 2D-3D point pairs may be obtained, where a 2D point in the 2D-3D point pairs is a first feature point of the real-time image, and the 3D point is the first 3D point in the AR point cloud map corresponding to the first feature point, where the set of 2D-3D point pairs may be used as a re-projection constraint relationship, and the server sends the re-projection constraint relationship to the terminal, so that the terminal may determine a residual equation according to the re-projection constraint relationship, and then perform parameter optimization on the camera pose (i.e., the terminal pose) calculated by the SLAM tracking algorithm by using the residual equation as a residual constraint term of the SLAM tracking algorithm, to obtain the optimized camera pose.

The specific implementation manner of implementing the visual relocation technology based on the AR cloud to perform the large-scale high-precision tracking on the terminal will be described in detail in the corresponding embodiment of the terminal, which is not limited herein.

The following describes a pre-creation process of the AR point cloud map.

Fig. 3 is a flowchart illustrating a method of pre-creating an AR point cloud map according to an exemplary embodiment, the process of creating an AR point cloud map shown in fig. 3 is generally an offline creation process, as shown in fig. 3, including the steps of:

In step S301, environmental data sent by a terminal is received, where the environmental data includes multiple frame environmental images of the current environment collected by the terminal and image parameters corresponding to each frame of environmental image respectively.

The frame environment image parameters include a camera shooting parameter when shooting the frame environment image and a terminal pose of the terminal relative to a local world coordinate system when shooting the frame environment image, the camera shooting parameter may include, for example, a focal length and a position of an optical center, the terminal pose refers to a 6-degree-of-freedom pose of the terminal under the local world coordinate system, which is calculated by a SLAM tracking system, the 6-degree-of-freedom pose includes a spatial position and a rotation amount of the terminal in the local world coordinate system, the rotation amount includes, for example, a roll angle roll, a pitch angle pitch, and a yaw angle yaw, and the environment data may further include one or more data of a depth environment image acquired by the terminal through a depth camera, a three-axis pose angle (or an angular rate) and an acceleration of the terminal measured by an IMU (Inertial Measurement Unit, that is, an inertial measurement unit) of the terminal, and GPS positioning information of the terminal.

In step S302, for each frame of environmental image, the global feature and the local feature corresponding to the frame of environmental image are extracted by using the first pre-training deep learning model, and a preset number of similar images corresponding to the frame of environmental image are determined from other environmental images according to the global feature corresponding to the frame of environmental image.

The global feature is also a feature representing the whole image, the local feature may include a plurality of feature points, similar to steps S102 and S103, in this step, a first model in a first pre-training deep learning model may be used to perform feature extraction on the frame environment image to obtain the global feature, and feature extraction is performed on the frame environment image through the second model to obtain the local feature, and then K similar images corresponding to the frame environment image may be determined through a K-nearest neighbor algorithm according to the global feature.

In step S303, for each frame of similar image, image registration is performed on the frame of environmental image and the frame of similar image according to the local feature of the frame of environmental image and the local feature of the frame of similar image, so as to obtain a registration result, where the registration result includes a plurality of second feature point pairs that are successfully registered.

For example, after a plurality of feature points of a frame image and a plurality of feature points corresponding to the frame-like image are input into a superflue model, the second feature point pairs which are successfully registered are obtained, each second feature point pair includes a third feature point of the frame environment image and a fourth feature point of the frame-like image, and the third feature point and the fourth feature point also belong to a one-to-one correspondence relationship.

In step S304, according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image, a preset depth estimation algorithm is used to determine second 3D points corresponding to each second feature point pair.

The preset depth estimation algorithm may include, for example, a triangulation algorithm, a depth filtering algorithm, and the like.

In the step, a first pose conversion matrix between cameras corresponding to the two images can be obtained through calculation according to the terminal pose corresponding to the frame environment image and the terminal pose corresponding to the frame similar image; and inputting the plurality of second characteristic point pairs, the camera shooting parameters of the frame environment image, the camera shooting parameters of the frame similar image and the first pose conversion matrix into the preset depth estimation model to obtain the second 3D point corresponding to each second characteristic point pair.

In step S305, the AR point cloud map is generated from the second 3D point.

And along with continuous registration between the pictures, obtaining a plurality of second characteristic points respectively corresponding to the second 3D points, and overlapping the plurality of second 3D points after continuous aggregation to form the AR point cloud map.

For example, fig. 4 is a schematic diagram illustrating a process of creating an AR point cloud map according to an exemplary embodiment, as shown in fig. 4, using an AR terminal (such as an AR handset, AR glasses, etc. in the figure) 101, collecting pictures and other sensor data 102, generating an AR point cloud map 103, and finally registering preset AR virtual contents 103-B into the AR point cloud map, which is only for illustration and not limitation in the present disclosure.

In view of possible errors in the second feature point pairs obtained when the registration of the two images is performed based on the local features, in order to improve the accuracy of the created AR point cloud map, after determining, according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image, each second 3D point corresponding to each second feature point respectively through a preset depth estimation model, each second 3D point may be verified, and the AR point cloud map may be adjusted according to the verification result, where the specific implementation process is shown in fig. 5.

FIG. 5 is a flow chart illustrating a method of creating an AR point cloud map according to the embodiment of FIG. 3, the method further comprising, as shown in FIG. 5:

in step S306, for each of the second 3D points, the second 3D point is re-projected to a specified image including the frame environment image or the frame-like image to obtain a projected point.

The step of re-projecting the second 3D point to the specified image based on a preset re-projection algorithm to obtain a projected point, and the specific re-projection process may refer to descriptions in related documents, which are not limited herein.

In step S307, it is determined whether the second 3D point satisfies a preset error threshold condition according to the projection point and the initial feature point of the second 3D point corresponding to the specified image.

The preset error threshold condition may be, for example: the difference between the pixel values of the projection point and the initial feature point is less than or equal to a preset pixel value.

In step S308, if it is determined that the second 3D point does not meet the preset error threshold condition, the second 3D point is deleted, so as to screen the second 3D point to obtain a target 3D point.

Taking the example of re-projecting the second 3D point X onto the frame environment image as an example, the initial feature point corresponding to the frame environment image by the second 3D point X is denoted as X1, the pixel coordinate of the initial feature point X1 is (100 ), the projected point after re-projecting the second 3D point X onto the frame environment image is assumed as X2, the pixel coordinate of the projected point X2 is (100, 103), at this time, it may be determined that the projected point X2 differs from the initial feature point X1 by 3 pixels, if the preset error threshold condition is that the difference between the pixel values of the projected point X2 and the initial feature point X1 is less than or equal to 2 pixel values, therefore, it may be determined that the second 3D point X does not meet the preset error threshold condition according to the pixel difference (i.e. 3 pixels) between the projected point X2 and the initial feature point X1, at this time, the second 3D point X needs to be deleted, and after each second 3D point is checked based on the above method, the remaining second 3D point is the target 3D point.

In this way, when step S305 is performed, the AR point cloud map may be generated from the target 3D point, so that the accuracy of the created AR point cloud map may be improved.

In addition, in order to further improve the accuracy of the created AR point cloud map, parameter optimization may be performed on each 3D point corresponding to the created AR point cloud map, and in one possible implementation manner of the present disclosure, parameter optimization may be performed on each second 3D point through a preset parameter optimization algorithm, so as to obtain an optimized 3D point; and carrying out position adjustment on the AR point cloud map according to the optimized 3D point.

The preset parameter optimization algorithm may be, for example, a BA (Bundle adjustment) algorithm, and a specific implementation manner of performing parameter optimization on the second 3D point by using the BA algorithm to obtain an optimized 3D point may be referred to in the related literature, which is not limited herein.

In addition, the second 3D point is optimized in parameter, and meanwhile, the pose of the camera can be optimized, specifically, each camera pose and 3D point can be used as optimization parameters, and a re-projection error function can be established to perform BA optimization.

After the AR point cloud map is created, the server can store the global feature vector of each image into the global feature description file so that feature matching can be performed on the basis of the global image features and each global feature vector stored in the global feature description file in the subsequent repositioning process based on the created AR point cloud map, and in addition, after the AR point cloud map is created, the server can record the corresponding relation between each 3D point in the AR point cloud map and each 2D feature point in the image so as to be applied to subsequent repositioning.

In the actual process of creating the AR point cloud map, the AR point cloud map created based on the above steps cannot be too large, which may result in a decrease in accuracy, so in one possible implementation of the present disclosure, the above process may be repeated to form small point cloud maps, and then the small point cloud maps are combined into one large map.

It will be appreciated that two point cloud maps need to have a common picture to merge, the common picture being a picture that characterizes the existence of a repeated common region between the two point cloud maps.

FIG. 6 is a flow chart illustrating a method of creating an AR point cloud map according to the embodiment shown in FIG. 3, the method including, as shown in FIG. 6, the steps of:

in step S309, for each two point cloud maps of the plurality of AR point cloud maps, it is determined whether the two point cloud maps correspond to a common picture, which is a picture representing that there is a repeated common area between the two point cloud maps.

In one possible implementation manner of this step, for each of the two point cloud maps, GPS positioning data corresponding to a picture for creating the point cloud map is determined, and then whether the two point cloud maps correspond to the common picture is determined according to the GPS positioning data corresponding to the picture.

In step S310, under the condition that the two point cloud maps correspond to the public picture, the two point cloud maps are combined to obtain a target point cloud map.

The two point cloud maps comprise a first point cloud map and a second point cloud map, the public pictures comprise a plurality of pictures used for creating the first point cloud map, and in the step, a camera for collecting the public pictures can be positioned at a third pose of the second point cloud map for each public picture; the specific positioning process refers to steps S102-S104, which are not described herein, and then the second pose conversion matrix between the first point cloud map and the second point cloud map may be determined according to the third pose and the fourth pose of the camera for collecting the public picture (the fourth pose may be directly obtained from the data file corresponding to the first point cloud map); thus, the two point cloud maps can be combined according to the second pose conversion matrix to obtain the target point cloud map.

It should be noted that, for a plurality of the public pictures, a plurality of the second pose conversion matrices may be calculated respectively, and in order to improve the accuracy of the merging map, an RANSAC algorithm may be used to determine an optimal pose conversion matrix from the plurality of the second pose conversion matrices, so that the two point cloud maps may be merged based on the optimal pose conversion matrix to obtain the target point cloud map.

In addition, it has been mentioned above that, in order to provide a more convenient AR persistence display capability for a user, it is also necessary to add the preset AR object to the created AR point cloud map, in one possible implementation, the preset AR object may be set offline by a server, and in the present disclosure, the AR point cloud map may be subjected to densification processing to obtain a dense point cloud map; then, object parameters corresponding to the preset AR object are obtained, where the object parameters may specifically include a preset model corresponding to the preset AR object and corresponding model parameters (for example, the model parameters may include parameters such as a size, a color, a state, and an action of the pet model) of the preset AR object, and then the preset AR object is added after rendering in the dense point cloud map according to the object parameters.

In another possible implementation manner of the present disclosure, the AR virtual object may also be added online in real time according to the operation of the user, specifically, the target image sent by the terminal and the second camera parameter when the terminal collects the target image are received, where the target image is an image to which the target AR object needs to be added.

For example, in an actual application scenario, if a user wants to add a virtual pet on a table in front of the table, the user may shoot the target image corresponding to the table through a camera of a terminal (such as an AR mobile phone), and then send the target image and the second camera parameter to a server, where the second camera parameter also includes a camera shooting parameter when the terminal collects the target image and a camera pose calculated by the SLAM tracking system.

And repositioning the pose of the terminal relative to the AR point cloud map according to the target image and the second camera parameter to obtain a repositioning pose.

The specific implementation manner of this step is similar to the relocation process based on the AR point cloud map in steps S102 to S104, and will not be described here again.

And then the repositioning pose can be sent to the terminal, so that the terminal can determine the eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose under the condition that the seventh pose of the target AR object relative to the local world coordinate system of the terminal is acquired, wherein the target AR object is an AR object added on the target image by a user.

The user can manually add the target AR object to the target image through the terminal, the terminal can obtain the seventh pose, after obtaining the repositioning pose, the terminal can calculate and obtain a conversion matrix of a local world coordinate system and an AR point cloud map coordinate system of the terminal according to the repositioning pose and the seventh pose, then can determine an eighth pose of the target AR object relative to the AR point cloud map based on the conversion matrix, record the target AR object and the eighth pose, and then send the target AR object and the eighth pose to the server.

And then receiving the target AR object and the eighth pose sent by the terminal, and adding the target AR object in the AR point cloud map according to the eighth pose.

In this step, the server may render the target AR object in the AR point cloud map according to the eighth pose, thereby implementing online addition of the target AR object in the AR point cloud map.

After adding the AR virtual objects (including online adding the target AR objects and/or offline adding the preset AR objects) in the AR point cloud map, each AR virtual object and its 6-degree-of-freedom pose in the AR point cloud map can be saved in an AR content pose file, so that the AR virtual objects in the AR content pose file can be subjected to lasting display later.

By adopting the method for creating the AR point cloud map, the local terminal (such as an AR mobile phone, AR glasses and the like) can be directly used for acquiring the current environment image of the terminal, and the environment image is sent to the server, so that the server creates the AR point cloud map according to the environment image, and no additional image acquisition equipment is required.

Fig. 7 is a flowchart of a cloud relocation method according to an exemplary embodiment, which is applied to a terminal (such as an AR mobile phone, AR glasses, etc.) with an AR function, and as shown in fig. 7, the method includes the following steps:

in step S701, a real-time image is acquired and a first camera parameter at the time of acquiring the real-time image is acquired.

The local AR terminal tracks the local environment in real time by running the SLAM tracking system, and establishes a local world coordinate system at the time of initialization, captures a real-time image of the surrounding environment through a camera of the AR terminal, and then transmits the real-time image to a server, the terminal also needs to record an internal camera parameter including a focal length and a position of an optical center and a camera pose (or referred to as a terminal pose) of the terminal relative to the local world coordinate system when the camera captures each real-time image, and the camera pose (or referred to as a terminal pose) refers to a 6-degree-of-freedom pose of the terminal in the local world coordinate system calculated by the SLAM tracking system, the 6-degree-of-freedom pose includes a spatial position and a rotation amount of the terminal in the local world coordinate system, the rotation amount includes, for example, a roll angle pitch angle and a yaw angle yaw, and the terminal also needs to transmit the internal camera and the camera pose of the camera when the camera captures the real-time image to the server, so that the first camera parameter includes the camera pose and the internal camera pose.

In step S702, relocation request data is sent to a server, where the relocation request data includes the real-time image and the first camera parameter, so that the server performs feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features, performs feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matching image, and performs relocation on a first pose of the terminal in the AR point cloud map according to the local image features, the matching image and the first camera parameter.

The first pose may be represented by a pose conversion matrix of a camera coordinate system corresponding to the real-time image and a point cloud map coordinate system (i.e., an anchor coordinate system).

In step S703, a repositioning result sent by the server is received, where the repositioning result includes the first pose.

By adopting the method, the terminal can send repositioning request data to the server, wherein the repositioning request data comprises the real-time image acquired by the terminal and the first camera parameters corresponding to the real-time image; the server can conduct feature extraction on the real-time image through the first pre-training deep learning model to obtain global image features and local image features; respectively performing feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map; according to the local image features, the matching images and the first camera parameters, the first pose of the terminal in the AR point cloud map is repositioned, so that compared with the method for directly performing image registration in a plurality of preset images based on local artificial feature points, the method for performing image screening from the plurality of preset images according to the global image features of the real-time images to obtain the matching images, and then performing image registration from the matching images according to the local image features, the repositioning result is higher in accuracy, the repositioning efficiency is relatively higher, and further, the global image features and the local image features in the method are extracted through a pre-trained deep learning model, and compared with the method for performing image registration by using the artificial feature points, the repositioning result is better in stability and accuracy.

FIG. 8 is a flow chart illustrating a method of AR persistence display, as shown in FIG. 8, according to the embodiment of FIG. 7, the method comprising the steps of:

in step S704, a mapping relationship between the AR point cloud map and the local world coordinate system of the terminal is determined according to the first pose and the second pose, where the second pose is a pose of the terminal, calculated by the SLAM tracking algorithm, relative to the local world coordinate system of the terminal.

For example, it is assumed that the first pose is represented as a pose matrix P, the second pose is represented as a pose matrix P1, and the mapping relationship is a pose conversion matrix p×p1.

In step S705, a fifth pose of the preset AR object sent by the server with respect to the AR point cloud map is received.

The preset AR object may include a preset multimedia file such as text, a picture, a video, a 3D model, a 2D/3D special effect, etc., and in one possible implementation manner, the server may obtain the fifth pose from an AR virtual content pose data file corresponding to the AR point cloud map.

In step S706, a sixth pose of the preset AR object with respect to the local world coordinate system is determined according to the mapping relationship and the fifth pose.

Assuming that the fifth pose is a pose matrix P2, a sixth pose of the preset AR object determined according to the mapping relationship and the fifth pose with respect to the local world coordinate system is P2 x P1.

In step S707, the preset AR object is rendered on the real-time image according to the sixth pose so as to add the preset AR object on the real-time image.

By adopting the AR content persistence display method, the server can reposition the first pose of the terminal in the AR point cloud map based on the AR point cloud map, then the terminal can determine the sixth pose of the preset AR object relative to the local world coordinate system based on the repositioning result (namely the first pose), the second pose of the terminal relative to the local world coordinate system, which is calculated by the terminal through the SLAM tracking algorithm, and the fifth pose of the preset AR object relative to the AR point cloud map, so that the preset AR content can be rendered at the sixth pose in the local world coordinate system, and the persistence display of the AR content is realized.

Illustratively, FIG. 9 is a schematic diagram illustrating a process of AR persistence display, as shown in FIG. 9, by running SLAM system 105 locally after the AR terminal is started, tracking the local environment in real time, and establishing a local world coordinate system, according to an exemplary embodiment; the AR terminal is used for sending the query picture 104, the visual repositioning 106 is realized after the query picture is registered with the point cloud map 103, the point cloud map coordinate system can be mapped to the local world coordinate system after the visual repositioning, namely, the mapping relation between the AR point cloud map and the local world coordinate system of the terminal is obtained, the fifth pose of the preset AR object 103-B relative to the point cloud map is known, and the sixth pose of the preset AR object 103-B under the local world coordinate system can be calculated after the visual repositioning coordinate mapping is completed. According to the sixth pose, rendering the preset AR object 103-B at the corresponding location of the local world coordinate system achieves a persistent display of the AR content, which is illustrated herein by way of example only and not limitation of the present disclosure.

In addition, the repositioning result also comprises a re-projection constraint relation, wherein the re-projection constraint relation is a corresponding relation between a target feature point of the real-time image and a 3D point corresponding to the target feature point on the AR point cloud map; the re-projection constraint relation comprises a plurality of target 3D points and the target feature points corresponding to each target 3D point, and as described above, the re-projection constraint relation can be a group of 2D-3D point pairs, the 2D points are the target feature points of the real-time image, the 3D points are the 3D points in the AR point cloud map corresponding to the target feature points, and the terminal can realize large-scale high-precision tracking of the terminal according to the re-projection constraint relation, specifically, as shown in fig. 10.

FIG. 10 is a flowchart of a method for implementing extensive tracking based on relocation of an AR point cloud map, according to the embodiment shown in FIG. 7, and further comprising the steps of:

in step S708, for each target 3D point, the target 3D point is reprojected to the real-time image by a preset reprojection algorithm, so as to obtain a reprojection point of the target 3D point corresponding to the real-time image;

the specific reprojection process of reprojecting the target 3D point onto the real-time image based on the preset reprojection algorithm to obtain the reprojected point in this step may refer to the descriptions in the related literature, which is not limited herein.

In step S709, a residual equation is determined according to the target feature point and the re-projection point corresponding to the target 3D point.

For example, the residual equation may be determined by using a BA algorithm to residual the target feature point and the re-projection point.

In step S710, the residual equation is used as a residual constraint term of the SLAM tracking algorithm to perform parameter optimization on the camera pose calculated by the SLAM tracking algorithm, so as to obtain an optimized camera pose.

Through the optimization, the global error of the SLAM system can be eliminated, so that the AR terminal can also ensure high precision in tracking under a large scene, and the situation that accumulated errors are generated when the local SLAM system is used for tracking is also required to be explained, the situation that the position and the repositioning result of the AR terminal in real space are inconsistent according to the position and the repositioning result of the camera pose calculated by the local SLAM system can occur, if the repositioning result is directly used for replacing the positioning result of the SLAM system, obvious jump occurs in system positioning, and based on the position and position optimizing method, the repositioning result is used as a residual constraint item of the local SLAM tracking algorithm to optimize the position and the camera pose calculated by the SLAM tracking algorithm, so that the accumulated errors can be gradually eliminated, and meanwhile, the smoothness of the positioning result is ensured.

For example, fig. 11 is a schematic diagram illustrating a process of implementing large-scale tracking based on relocation of an AR point cloud map according to an exemplary embodiment, where, as shown in fig. 11, an SLAM system 105 is locally operated after an AR terminal is started, real-time tracking and positioning are performed, and a local world coordinate system is established at the time of initialization; the AR terminal is used for sending the query picture 104 (i.e. a real-time image) and registering the 3D point cloud map, a visual re-projection constraint relation 108 is obtained, the visual re-projection constraint relation is added to a local SLAM back-end optimization framework, a SLAM and visual repositioning tight coupling system 109 is realized, and a residual equation is determined according to the re-projection constraint relation based on the tight coupling system; and taking the residual equation as a residual constraint term of the SLAM tracking algorithm to perform parameter optimization on the camera pose calculated by the SLAM tracking algorithm, and finally eliminating global accumulated errors by the pose output by the system to realize real-time tracking and positioning 110 of a large-scale scene.

In addition, in order to realize real-time online addition of the AR virtual object according to the operation of the user, the terminal may further send a target image and a second camera parameter when the terminal collects the target image to the server, where the target image is an image to which the target AR object needs to be added, for example, if the user wants to add a virtual pet on a table in front of the table, the user may capture the target image corresponding to the table through a camera of the terminal (such as an AR mobile phone); then receiving a repositioning pose of the terminal in the AR point cloud map, which is transmitted by the server and determined by the server according to the target image and the second camera parameter; acquiring a seventh pose of the target AR object added by the user on the target image relative to a local world coordinate system of the terminal; determining an eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose; and sending the target AR object and the eighth pose to the server, so that the server adds the target AR object in the AR point cloud map according to the eighth pose.

Fig. 12 is a block diagram of a cloud relocation apparatus according to an exemplary embodiment, applied to a server, as shown in fig. 12, the apparatus includes:

a first receiving module 1201 configured to receive relocation request data sent by a terminal, where the relocation request data includes a real-time image collected by the terminal and a first camera parameter corresponding to the real-time image;

a feature extraction module 1202 configured to perform feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features;

the image matching module 1203 is configured to perform feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matching image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map;

and a repositioning module 1204 configured to reposition a first pose of the terminal in the AR point cloud map according to the local image feature, the matching image, and the first camera parameter.

Optionally, the first pre-trained deep learning model comprises a first model and a second model; the feature extraction module 1202 is configured to perform feature extraction on the real-time image through the first model to obtain the global image feature, and perform feature extraction on the real-time image through the second model to obtain the local image feature.

Optionally, the image matching module 1203 is configured to output the matching image consistent with the global image feature of the real-time image in the plurality of preset images through a K-nearest neighbor algorithm according to the global image feature.

Optionally, the repositioning module 1204 is configured to register the real-time image with each of the matching images according to the local image features, so as to obtain a first feature point pair with successful registration; and repositioning the first pose of the terminal in the AR point cloud map according to the first characteristic point pair and the first camera parameter.

Optionally, the local image feature includes a plurality of feature points, and the repositioning module 1204 is configured to, for each of the matched images, input, into a second pre-training deep learning model, a plurality of feature points corresponding to the real-time image and a plurality of feature points corresponding to the matched image, and obtain the first feature point pair that is successfully registered.

Optionally, the first feature point pair includes a first feature point of the real-time image and a second feature point of the matching image; the repositioning module 1204 is configured to determine, for each of the first feature point pairs, a first 3D point of the first feature point in the AR point cloud map according to a second feature point in the first feature point pair; and calculating a first pose of the terminal in the AR point cloud map through a preset pose algorithm according to a plurality of first feature point pairs, the first 3D points corresponding to each first feature point and the first camera parameters.

Optionally, fig. 13 is a block diagram of a cloud repositioning device according to the embodiment shown in fig. 12, and as shown in fig. 13, the device further includes:

the first sending module 1205 is configured to send the first pose to the terminal, so that the terminal determines a mapping relationship between the AR point cloud map and a local world coordinate system of the terminal according to the first pose and a second pose, where the second pose is a pose of the terminal, calculated by the terminal through an instant positioning and mapping SLAM tracking algorithm, relative to the local world coordinate system.

Optionally, as shown in fig. 13, the apparatus further includes an AR cloud creation module 1206 configured to be created in advance by:

determining second 3D points corresponding to each second feature point respectively through a preset depth estimation algorithm according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image;

and generating the AR point cloud map according to the second 3D points.

Optionally, the image parameters include camera shooting parameters and terminal pose of the terminal relative to a local world coordinate system when the frame of environment image is acquired; the AR cloud creating module 1206 is configured to calculate a first pose conversion matrix between cameras corresponding to the two images according to the terminal pose corresponding to the frame environment image and the terminal pose corresponding to the frame similar image; and inputting a plurality of second characteristic point pairs, camera shooting parameters of the frame environment image, camera shooting parameters of the frame similar image and the first pose conversion matrix into the preset depth estimation model to obtain the second 3D points corresponding to each second characteristic point pair respectively.

Optionally, the AR cloud creating module 1206 is further configured to, for each of the second 3D points, re-project the second 3D point to a specified image to obtain a projected point, where the specified image includes the frame environment image or the frame similar image; determining whether the second 3D point meets a preset error threshold condition according to the projection point and the initial characteristic point corresponding to the second 3D point in the designated image; deleting the second 3D point under the condition that the second 3D point does not meet the preset error threshold condition, so as to screen the second 3D point to obtain a target 3D point; and generating the AR point cloud map according to the target 3D point.

Optionally, the AR cloud creating module 1206 is further configured to perform parameter optimization on each of the second 3D points by using a preset parameter optimization algorithm to obtain an optimized 3D point; and carrying out position adjustment on the AR point cloud map according to the optimized 3D points.

Optionally, the AR cloud creation module 1206 is further configured to determine, for each two of the plurality of AR point cloud maps, whether the two point cloud maps correspond to a common picture; the public picture is a picture representing a repeated public area between the two point cloud maps; and under the condition that the two point cloud maps correspond to the public picture, merging the two point cloud maps to obtain a target point cloud map.

Optionally, the two point cloud maps include a first point cloud map and a second point cloud map, the public picture includes a plurality of pictures for creating the first point cloud map, and the AR cloud creating module 1206 is configured to locate, for each public picture, a third pose of a camera that collects the public picture on the second point cloud map; determining a second pose conversion matrix between the first point cloud map and the second point cloud map according to the third pose and a fourth pose of a camera for acquiring the public picture relative to the first point cloud map; and merging the two point cloud maps according to the second pose conversion matrix to obtain a target point cloud map.

Optionally, as shown in fig. 13, the apparatus further includes: an AR object adding module 1207 configured to perform a densification process on the AR point cloud map to obtain a dense point cloud map; and obtaining object parameters corresponding to a preset AR object, and adding the preset AR object into the dense point cloud map according to the object parameters.

Optionally, the AR object adding module 1207 is further configured to receive a target image sent by the terminal and a second camera parameter when the terminal collects the target image, where the target image is an image to which a target AR object needs to be added; repositioning the pose of the terminal relative to the AR point cloud map according to the target image and the second camera parameter to obtain a repositioning pose; sending the repositioning pose to the terminal, so that the terminal determines an eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose under the condition that the seventh pose of the target AR object relative to a local world coordinate system of the terminal is acquired, wherein the target AR object is an AR object added by a user on the target image; and receiving the target AR object and the eighth pose sent by the terminal, and adding the target AR object in the AR point cloud map according to the eighth pose.

Fig. 14 is a block diagram of a cloud relocation apparatus according to an exemplary embodiment, applied to a terminal, as shown in fig. 14, the apparatus includes:

a data acquisition module 1401 configured to acquire a real-time image and to acquire a first camera parameter at the time of acquiring the real-time image;

a second sending module 1402, configured to send repositioning request data to a server, where the repositioning request data includes the real-time image and the first camera parameter, so that the server performs feature extraction on the real-time image through a first pre-training deep learning model to obtain global image features and local image features, and performs feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, and repositioning a first pose of the terminal in the AR point cloud map according to the local image features, the matched image and the first camera parameter;

a second receiving module 1403 is configured to receive a repositioning result sent by the server, where the repositioning result includes the first pose.

Optionally, fig. 15 is a block diagram of a cloud repositioning apparatus according to the embodiment shown in fig. 14, and as shown in fig. 15, the apparatus further includes:

A first determining module 1404 configured to determine a mapping relationship between the AR point cloud map and a local world coordinate system of the terminal according to the first pose and a second pose, where the second pose is a pose of the terminal, calculated by the terminal through a SLAM tracking algorithm, relative to the local world coordinate system;

a third receiving module 1405 configured to receive a fifth pose of the preset AR object sent by the server with respect to the AR point cloud map;

a second determining module 1406 configured to determine a sixth pose of the preset AR object with respect to the local world coordinate system according to the mapping relation and the fifth pose;

an AR object rendering module 1407 configured to render the preset AR object on the real-time image according to the sixth pose so as to add the preset AR object on the real-time image.

Optionally, fig. 16 is a block diagram of a cloud repositioning device according to the embodiment shown in fig. 14, where the repositioning result further includes a reprojection constraint relationship, where the reprojection constraint relationship is a corresponding relationship between a target feature point of the real-time image and a 3D point corresponding to the target feature point on the AR point cloud map; the re-projection constraint relation comprises a plurality of target 3D points and target feature points corresponding to the target 3D points respectively; as shown in fig. 16, the apparatus further includes:

A re-projection module 1408, configured to re-project, for each target 3D point, the target 3D point to the real-time image by using a preset re-projection algorithm, so as to obtain a re-projection point corresponding to the target 3D point on the real-time image;

a third determining module 1409 configured to determine a residual equation according to the target feature point and the re-projection point corresponding to the target 3D point;

the parameter optimization module 1410 is configured to use the residual equation as a residual constraint term of the SLAM tracking algorithm to perform parameter optimization on the camera pose calculated by the SLAM tracking algorithm, so as to obtain an optimized camera pose.

Optionally, fig. 17 is a block diagram of a cloud repositioning device according to the embodiment shown in fig. 14, and as shown in fig. 17, the device further includes:

a third sending module 1411 configured to send a target image and a second camera parameter when the terminal collects the target image to the server, wherein the target image is an image to which a target AR object needs to be added; receiving a repositioning pose of the terminal in the AR point cloud map, which is sent by the server and is determined by the server according to the target image and the second camera parameter; acquiring a seventh pose of the target AR object added by a user on the target image relative to a local world coordinate system of the terminal; determining an eighth pose of the target AR object relative to the AR point cloud map according to the repositioning pose and the seventh pose; and sending the target AR object and the eighth pose to the server, so that the server adds the target AR object in the AR point cloud map according to the eighth pose.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

By adopting the device, the terminal can send repositioning request data to the server, wherein the repositioning request data comprises a real-time image acquired by the terminal and a first camera parameter corresponding to the real-time image; the server can conduct feature extraction on the real-time image through the first pre-training deep learning model to obtain global image features and local image features; respectively performing feature matching on the real-time image and a plurality of preset images according to the global image features to obtain a matched image, wherein the preset image is an image for creating an Augmented Reality (AR) point cloud map; according to the local image features, the matching images and the first camera parameters, the first pose of the terminal in the AR point cloud map is repositioned, so that compared with the method for directly performing image registration in a plurality of preset images based on local artificial feature points, the method for performing image screening from the plurality of preset images according to the global image features of the real-time images to obtain the matching images, and then performing image registration from the matching images according to the local image features, the repositioning result is higher in accuracy, the repositioning efficiency is relatively higher, and further, the global image features and the local image features in the method are extracted through a pre-trained deep learning model, and compared with the method for performing image registration by using the artificial feature points, the repositioning result is better in stability and accuracy.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the cloud relocation method provided by the present disclosure.

Fig. 18 is a block diagram illustrating an apparatus 1800 for cloud relocation according to an example embodiment. The apparatus may be an electronic device, for example, the apparatus 1800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.

Referring to fig. 18, apparatus 1800 may include one or more of the following components: a processing component 1802, a memory 1804, a power component 1806, a multimedia component 1808, an audio component 1810, an input/output (I/O) interface 1812, a sensor component 1814, and a communication component 1816.

The processing component 1802 generally controls overall operation of the device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions to perform all or part of the cloud relocation method steps described above. Further, the processing component 1802 may include one or more modules that facilitate interactions between the processing component 1802 and other components. For example, the processing component 1802 may include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802.

The memory 1804 is configured to store various types of data to support operations at the apparatus 1800. Examples of such data include instructions for any application or method operating on the device 1800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 1804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 1806 provides power to the various components of the device 1800. The power components 1806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 1800.

The multimedia component 1808 includes a screen between the device 1800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1808 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 1800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1810 is configured to output and/or input audio signals. For example, the audio component 1810 includes a Microphone (MIC) configured to receive external audio signals when the device 1800 is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signals may be further stored in the memory 1804 or transmitted via the communication component 1816. In some embodiments, audio component 1810 also includes a speaker for outputting audio signals.

The I/O interface 1812 provides an interface between the processing component 1802 and a peripheral interface module, which may be a keyboard, click wheel, button, or the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1814 includes one or more sensors for providing status assessment of various aspects of the apparatus 1800. For example, the sensor assembly 1814 may detect the on/off state of the device 1800, the relative positioning of the assemblies, such as the display and keypad of the device 1800, the sensor assembly 1814 may also detect the change in position of the device 1800 or one of the assemblies of the device 1800, the presence or absence of user contact with the device 1800, the orientation or acceleration/deceleration of the device 1800, and the change in temperature of the device 1800. The sensor assembly 1814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 1814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1816 is configured to facilitate communication between the apparatus 1800 and other devices, either wired or wireless. The device 1800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the cloud relocation method described above.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 1804, including instructions executable by processor 1820 of apparatus 1800 to perform the cloud relocation method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described cloud relocation method when executed by the programmable apparatus.

Fig. 19 is a block diagram illustrating an apparatus 1900 for cloud relocation according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to fig. 19, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that are executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, processing component 1922 is configured to execute instructions to perform the cloud relocation method described above.

The apparatus 1900 may further include a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output (I/O) interface 1958. The apparatus 1900 may operate based on an operating system stored in the memory 1932, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ，Linux ^TM ，FreeBSD ^TM Or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A cloud relocation method, applied to a server, the method comprising:

2. The method of claim 1, wherein the first pre-trained deep learning model comprises a first model and a second model; the step of extracting features of the real-time image through the first pre-training deep learning model to obtain global image features and local image features comprises the following steps:

3. The method of claim 1, wherein the performing feature matching on the real-time image and the plurality of preset images according to the global image feature to obtain a matching image includes:

4. The method of claim 1, wherein repositioning the first pose of the terminal in the AR point cloud map according to the local image features, the matching image, and the first camera parameters comprises:

5. The method of claim 4, wherein the local image features include a plurality of feature points, and the registering the real-time image with each of the matching images according to the local image features, respectively, to obtain a first feature point pair that is successfully registered includes:

6. The method of claim 4, wherein the first pair of feature points includes a first feature point of the real-time image and a second feature point of the matching image; the repositioning the first pose of the terminal in the AR point cloud map according to the first feature point pair and the first camera parameter includes:

7. The method of claim 1, wherein after said repositioning of said terminal's first pose in said AR point cloud map according to said local image features, said matching image, and said first camera parameters, said method further comprises:

8. The method of claim 1, wherein the AR point cloud map is pre-created by:

and generating the AR point cloud map according to the second 3D points.

9. The method of claim 8, wherein the image parameters include camera capture parameters and terminal pose of the terminal relative to a local world coordinate system when the frame of ambient image is acquired; the determining, according to the plurality of second feature point pairs, the image parameters of the frame environment image and the image parameters of the frame similar image, the second 3D point corresponding to each second feature point pair through a preset depth estimation model includes:

10. The method according to claim 8, wherein after determining each second feature point respectively corresponding to the second 3D points by a preset depth estimation model according to the plurality of the second feature point pairs, the image parameters of the frame environment image, and the image parameters of the frame similar image, the method further comprises:

the generating the AR point cloud map from the second 3D point includes:

and generating the AR point cloud map according to the target 3D point.

11. The method of claim 8, wherein the method further comprises:

12. The method of claim 8, wherein after the generating the AR point cloud map from the second 3D point, the method further comprises:

13. The method of claim 12, wherein the two point cloud maps comprise a first point cloud map and a second point cloud map, the common picture comprises a plurality of pictures for creating the first point cloud map, and the merging the two point cloud maps to obtain the target point cloud map comprises:

14. The method of claim 8, wherein after the generating the AR point cloud map from the second 3D point, the method further comprises:

15. The method according to any one of claims 8-14, wherein after the generating the AR point cloud map from the second 3D point, the method further comprises:

16. The cloud repositioning method is characterized by being applied to a terminal, and comprises the following steps:

17. The method of claim 16, wherein the method further comprises:

18. The method of claim 17, wherein the repositioning result further comprises a re-projection constraint relationship, the re-projection constraint relationship being a correspondence between a target feature point of the real-time image and a corresponding 3D point of the target feature point on the AR point cloud map; the re-projection constraint relation comprises a plurality of target 3D points and target feature points corresponding to the target 3D points respectively; the method further comprises the steps of:

19. The method according to any one of claims 16-18, further comprising:

20. A cloud relocation apparatus, for application to a server, the apparatus comprising:

21. A cloud relocation apparatus, applied to a terminal, the apparatus comprising:

22. The cloud repositioning device is characterized by being applied to a server and comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

23. The cloud repositioning device is characterized by being applied to a terminal and comprising:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

24. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 15.

25. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 16 to 19.