CN113190120B

CN113190120B - Pose acquisition method and device, electronic equipment and storage medium

Info

Publication number: CN113190120B
Application number: CN202110510890.0A
Authority: CN
Inventors: 夏睿; 谢卫健; 王楠; 张也
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-06-24
Anticipated expiration: 2041-05-11
Also published as: WO2022237048A1; CN113190120A; TW202244680A; KR102464271B1

Abstract

The application relates to a pose acquisition method, a pose acquisition device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first image, wherein the first image is an image obtained by scanning an object to be scanned by electronic equipment; responding to the first pose information missing or invalid, acquiring a second image, and determining the first pose information according to the second image and the space model, wherein the second image is an image obtained by scanning the electronic equipment aiming at the object to be scanned, the first pose information is pose information of the electronic equipment and/or the object to be scanned, and the second pose information is pose information of the electronic equipment and/or the object to be scanned; determining second position and orientation information according to the first image, the space model and the first position and orientation information; responding to the second position and posture information and the first position and posture information meeting a preset first condition, and outputting the second position and posture information; otherwise, determining that the first pose information is invalid.

Description

Pose acquisition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of object recognition technologies, and in particular, to a pose acquisition method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence (AR) technology, AR technology is gradually applied to various fields of production and living. The three-dimensional object is identified by using the augmented reality technology, the rendering effect of the augmented reality can be presented according to the identification result, but the efficiency of identifying the three-dimensional object by using the augmented reality technology in the related technology is lower, and the accuracy is poorer.

Disclosure of Invention

The application provides a pose acquisition method, a pose acquisition device, electronic equipment and a storage medium.

According to a first aspect of an embodiment of the present application, there is provided a pose acquisition method, including:

acquiring a first image and a space model of an object to be scanned, wherein the first image is an image obtained by scanning the object to be scanned by electronic equipment;

responding to the absence or invalidity of first pose information, acquiring a second image, and determining the first pose information according to the second image and the space model, wherein the second image is an image scanned by the electronic equipment for the object to be scanned, and the first pose information is pose information of the electronic equipment and/or the object to be scanned;

determining second pose information according to the first image, the space model and the first pose information, wherein the second pose information is pose information of the electronic equipment and/or the object to be scanned;

responding to the second position and posture information and the first position and posture information meeting a preset first condition, and outputting the second position and posture information; otherwise, determining that the first attitude information is invalid.

In one embodiment, determining the first pose information from the second image and the spatial model comprises:

acquiring at least one image frame corresponding to the second image in the spatial model, and determining first matching information between the feature points of the second image and the feature points of the at least one image frame;

acquiring point cloud corresponding to the at least one image frame in the space model, and determining second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information;

and determining the first attitude information according to the first matching information and the second matching information.

In one embodiment, the acquiring at least one image frame of the spatial model corresponding to the second image comprises:

determining a similarity of each image frame in the spatial model to the second image;

and determining the image frame with the similarity higher than a preset similarity threshold value with the second image as the image frame corresponding to the second image.

In one embodiment, the determining first matching information between the feature points of the second image and the feature points of the at least one image frame includes:

acquiring feature points and descriptors of the second image and feature points and descriptors of the image frame;

determining initial matching information between the feature points of the second image and the feature points of the image frame according to the descriptor of the second image and the descriptor of the image frame;

determining a basis matrix and/or an essential matrix of the second image and the image frame according to the initial matching information;

and filtering the initial matching information according to the basic matrix and/or the essential matrix to obtain the first matching information.

In one embodiment, the determining second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information comprises:

and matching the feature points of the second image matched with the feature points of the image frame with the three-dimensional points of the point cloud corresponding to the feature points of the image frame to obtain second matching information.

In one embodiment, the determining the first pose information according to the first matching information and the second matching information includes:

acquiring the gravity acceleration of the electronic equipment;

and determining the first attitude information according to the first matching information, the second matching information and the gravity acceleration.

In one embodiment, the determining second pose information from the first image, the spatial model, and the first pose information comprises:

determining third pose information corresponding to the first image according to the first pose information and the first image, wherein the third pose information is pose information of the electronic equipment relative to the object to be scanned;

determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model according to the third pose information;

responding to the third matching information and a preset second condition, and determining fourth matching information between the feature point of the first image and the feature point of at least one image frame of the space model according to the third pose information;

and determining the second position information according to the third matching information and the fourth matching information.

In one embodiment, the first pose information comprises fourth pose information, wherein the fourth pose information is pose information of the object to be scanned in a world coordinate system;

determining third pose information corresponding to the first image according to the first pose information and the first image, including:

acquiring fifth pose information from a positioning module according to the first image, wherein the fifth pose information is pose information of the electronic equipment in a world coordinate system;

and determining the third pose information according to the fourth pose information and the fifth pose information.

In one embodiment, the determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model according to the third pose information includes:

projecting the point cloud of the space model onto the first image according to the third pose information to form a plurality of projection points, and extracting a descriptor of each projection point;

extracting feature points and descriptors of the first image frame;

and determining third matching information between the feature points and the three-dimensional points of the point cloud according to the descriptors corresponding to the feature points and the descriptors of the projection points.

In one embodiment, the determining fourth matching information between the feature point of the first image and the feature point of at least one image frame of the spatial model according to the third pose information includes:

determining at least one image frame matched with the third pose information according to the third pose information and pose information of the image frame of the space model;

acquiring the feature points and descriptors of the first image and the feature points and descriptors of the image frame matched with the third pose information;

and determining fourth matching information between the feature points of the first image and the feature points of the image frame according to the descriptor of the first image and the descriptor of the image frame.

In one embodiment, the determining the second pose information according to the third matching information and the fourth matching information includes:

acquiring the gravity acceleration of the electronic equipment;

and determining the second posture information according to the third matching information, the fourth matching information and the gravity acceleration.

In one embodiment, the second posture information and the first posture information meet a preset first condition, including:

the error between the second position posture information and the first position posture information is smaller than a preset error threshold value; and/or the presence of a gas in the gas,

the third matching information meets a preset second condition, and the third matching information comprises:

and the number of matching combinations between the first image and the point cloud of the space model is greater than a preset number threshold, wherein the matching combinations comprise mutually matched feature points and three-dimensional points.

In one embodiment, the obtaining a spatial model of an object to be scanned includes:

acquiring a multi-frame modeling image obtained by the electronic equipment by scanning aiming at the object to be scanned, and synchronously acquiring sixth pose information corresponding to each frame modeling image;

matching the characteristic points of the multi-frame modeling image, and triangularizing the characteristic points according to a matching result to form a point cloud;

determining at least one image frame from the multi-frame modeling image, and determining a point cloud corresponding to each image frame;

and constructing the at least one image frame, the sixth pose information corresponding to each image frame and the point cloud into a space model.

According to a second aspect of embodiments of the present application, there is provided a pose acquisition apparatus including:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first image and a space model of an object to be scanned, and the first image is an image obtained by scanning the object to be scanned by electronic equipment;

the first pose module is used for responding to the missing or invalid of first pose information, acquiring a second image and determining the first pose information according to the second image and the space model, wherein the second image is an image obtained by scanning the electronic equipment aiming at the object to be scanned, and the first pose information is pose information of the electronic equipment and/or the object to be scanned;

the second pose module is used for determining second pose information according to the first image, the space model and the first pose information, wherein the second pose information is pose information of the electronic equipment and/or the object to be scanned;

the output module is used for responding to the second position and posture information and the first position and posture information meeting a preset first condition and outputting the second position and posture information; otherwise, determining that the first attitude information is invalid

In one embodiment, the first attitude module is specifically configured to:

acquiring point clouds corresponding to the at least one image frame in the space model, and determining second matching information between the characteristic point of the second image and the three-dimensional point of the point clouds according to the first matching information;

determining the first attitude information according to the first matching information and the second matching information

In an embodiment, the first pose module, when configured to obtain at least one image frame of the spatial model corresponding to the second image, is specifically configured to:

In one embodiment, when the first pose module is configured to determine first matching information between the feature point of the second image and the feature point of the at least one image frame, the first pose module is specifically configured to:

In an embodiment, when the first pose module is configured to determine second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information, the first pose module is specifically configured to:

and matching the characteristic points of the second image matched with the characteristic points of the image frame with the three-dimensional points of the point cloud corresponding to the characteristic points of the image frame to obtain second matching information.

In an embodiment, the first pose module is configured to, when determining the first pose information according to the first matching information and the second matching information, specifically:

acquiring the gravity acceleration of the electronic equipment;

determining the first attitude information according to the first matching information, the second matching information and the gravitational acceleration

In one embodiment, the second posture module is specifically configured to:

the second pose module is configured to, when determining third pose information corresponding to the first image according to the first pose information and the first image, specifically:

In an embodiment, the second pose module, when determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model according to the third pose information, is specifically configured to:

extracting feature points and descriptors of the first image frame;

and determining third matching information between the feature point and the three-dimensional point of the point cloud according to the descriptor corresponding to the feature point and the descriptor of the projection point.

In an embodiment, the second pose module, when determining fourth matching information between the feature point of the first image and the feature point of the at least one image frame of the spatial model according to the third pose information, is specifically configured to:

acquiring a feature point and a descriptor of the first image and a feature point and a descriptor of an image frame matched with the third pose information;

In an embodiment, the second pose module is configured to, when determining the second pose information according to the third matching information and the fourth matching information, specifically:

acquiring the gravity acceleration of the electronic equipment;

and the number of matching combinations between the first image and the point cloud of the space model is greater than a preset number threshold, wherein the matching combinations comprise a pair of feature points and three-dimensional points which are matched with each other.

In one embodiment, when the obtaining module is configured to obtain a spatial model of an object to be scanned, the obtaining module is specifically configured to:

According to a third aspect of embodiments herein, there is provided an electronic device comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of the first aspect when executing the computer instructions.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.

According to the embodiment, a first image obtained by scanning an object to be scanned and a spatial model of the object to be scanned are obtained through the electronic equipment, a second image is obtained in response to the fact that first position and posture information is missing or invalid, the first position and posture information is determined according to the second image and the spatial model, second position and posture information is determined according to the first image, the spatial model and the first position and posture information, finally, the second position and posture information is output in response to the fact that the second position and posture information and the first position and posture information meet a preset first condition, and otherwise, the first position and posture information is invalid. The first pose information is determined according to the second image obtained by scanning the object to be scanned by the electronic equipment and the space model, and after the first pose information is determined, the first pose information can be continuously used for determining the second pose information corresponding to a plurality of frames of first images, and the first pose information is updated once until the second pose information and the first pose information do not accord with the first condition, so that the efficiency and the accuracy of pose information acquisition can be improved, namely the efficiency and the accuracy of identifying the three-dimensional object by using the augmented reality technology are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart of a pose information acquisition method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an electronic device acquiring an image according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a spatial model acquisition process according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a pose information acquisition apparatus shown in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the related art, when a three-dimensional object is identified by using an augmented reality technology, an electronic device displays a space model and simultaneously presents a preview image obtained by scanning the object to be scanned, a user needs to manually align the space model and the preview image of the object to be scanned, namely, needs to find a proper view angle, so that the outline of the object to be scanned presented on the electronic device is matched with the outline of the space model, the object to be scanned can be tracked by scanning on the basis, and once tracking fails, the user needs to return to the proper view angle found initially again to realign the space model and the preview image of the object to be scanned, so that the efficiency and accuracy of tracking the object to be scanned are low, the operation difficulty of the user is large, and the use experience is poor.

In a first aspect, at least one embodiment of the present application provides a pose acquisition method, please refer to fig. 1, which illustrates a flow of the method, including steps S101 to S103.

The method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA) handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server, which may be a local server, a cloud server, or the like.

In step S101, a first image and a spatial model of an object to be scanned are obtained, where the first image is an image obtained by scanning the object to be scanned by an electronic device.

The electronic device may be a terminal device such as a mobile phone and a tablet computer, or an image acquisition device such as a camera and a scanning device. When the electronic device is a terminal device, the obtaining of the first image in this step, the determination and output of the second pose information in the subsequent steps, and the determination and update of the first pose information may also be performed by the terminal device. The object to be scanned may be a three-dimensional object for which reality augmentation techniques are aimed.

When the electronic equipment scans an object to be scanned, a plurality of frames of first images can be continuously obtained, namely an image sequence is obtained; the first image is any one frame in the image sequence, that is, the pose acquisition method provided by the embodiment of the application can be executed for any one frame in the image sequence; optionally, when the electronic device scans the object to be scanned, the method may be executed for each obtained frame of the first image, so as to obtain the second pose information corresponding to each frame of the first image. For example, in the example shown in fig. 2, the electronic device is shown to move around the object to be scanned and acquire an image, that is, the electronic device acquires one image frame at the position of the previous image frame, then moves to the position of the previous image frame to acquire one image frame, and then moves to the position of the current image frame to acquire one image frame.

The spatial model includes a point cloud of the object to be scanned, at least one image frame, and pose information (such as sixth pose information mentioned below) corresponding to each image frame. The image frame can be understood as an image obtained by shooting the object to be scanned by the electronic device under the corresponding sixth pose information. Each image frame corresponds to a part of point cloud, and the corresponding relation can be determined by the triangularization relation of image feature points in the modeling process and can also be determined by pose information.

In step S102, in response to that first pose information is missing or invalid, a second image is obtained, and the first pose information is determined according to the second image and the spatial model, where the second image is an image obtained by scanning the object to be scanned by the electronic device, and the first pose information is pose information of the electronic device and/or the object to be scanned.

When the method is initially operated, the first posture information is lost, so the first posture information needs to be determined.

Wherein, the pose information of the electronic device may be pose information (T) of the electronic device in a world coordinate system_cw) I.e. pose information of the electronic device relative to the origin of the world coordinate system. The pose information of the object to be scanned can be the pose information (T) of the object to be scanned in the world coordinate system_ow) I.e. pose information of the object to be scanned relative to the origin of the world coordinate system. The pose information of the electronic device and the object to be scanned may be pose information (T) of the electronic device with respect to the object to be scanned_co)。

In step S103, second pose information is determined according to the first image, the spatial model and the first pose information, where the second pose information is pose information of the electronic device and/or the object to be scanned.

For each frame of the first image, the first pose information is utilized when determining the corresponding second pose information, and the first pose information is reusable until it is updated. Due to the utilization of the first position and posture information, the operation that a user aligns the model and the object to be scanned manually can be avoided, so that the efficiency and accuracy of obtaining the second position and posture information can be improved, and the efficiency and accuracy of tracking the object to be scanned can be improved.

The first position and posture information can be determined by a detector or a detection module, the detector or the detection module is used for acquiring an image obtained by scanning of the electronic equipment as a second image and determining the first position and posture information according to the second image and the space model, namely the detector or the detection module is used for obtaining a tracking starting point, namely guiding the tracker to track the object to be scanned. The second position and posture information can be determined by a tracker or a tracking module, the tracker or the tracking module is used for acquiring an image obtained by scanning of the electronic equipment as a first image, and determining the second position and posture information by using the first image, the space model and the first position and posture information, namely the tracker or the tracking module is used for tracking the object to be scanned. When the first position information is determined, only the first image and the space model can be utilized, no other guidance information exists, and when the second position information is determined, the guidance of the first position information is added on the basis of utilizing the second image and the space model, so that the speed of determining the first position information is slower than that of determining the second position information, namely the efficiency of determining the first position information is lower than that of determining the second position information, therefore, the accuracy of the second position information can be improved by determining the first position information, and the efficiency can be improved by repeatedly utilizing the first position information by the second position information.

It should be noted that the image of one frame scanned by the electronic device can be used as the first image, the second image, or both the first image and the second image. When the first posture information is missing or invalid, namely the first posture information needs to be determined or updated, an image obtained by scanning of the electronic equipment can be used as a first image; when the first position and posture information exists and is effective, namely the first position and posture information does not need to be determined or updated, an image obtained by scanning of the electronic equipment can be used as a second image; when a frame of image scanned by the electronic device is used as the first image to determine the first pose information, and the electronic device has not yet scanned to obtain a next frame of image (for example, the electronic device has not moved relative to the object to be scanned or has not yet acquired a period of the next frame of image after moving), the frame of image may continue to be used as the second image to determine the second pose information.

In step S104, in response to that the second posture information and the first posture information meet a preset first condition, outputting the second posture information; otherwise, determining that the first attitude information is invalid.

In a possible implementation manner, an error threshold may be preset, and the first condition is that an error between the second posture information and the first posture information is smaller than the error threshold. When the errors of the first posture information and the second posture information are compared, the same type of postures can be compared, namely the posture information of the electronic equipment in the first posture information in the world coordinate system and the posture information of the electronic equipment in the second posture information in the world coordinate system can be compared, the posture information of the object to be scanned in the world coordinate system in the first posture information and the posture information of the object to be scanned in the second posture information in the world coordinate system can be compared, and the posture information of the electronic equipment relative to the object to be scanned in the first posture information and the posture information of the electronic equipment relative to the object to be scanned in the second posture information can be compared.

The second pose information and the first pose information meet the first condition, and can indicate that the second pose information is consistent with the first pose information, and both the pose information and the first pose information are effective poses, so that the second pose information is output, namely the second pose information of the first image of the frame is output, and simultaneously the first pose information can be continuously used for determining the second pose information of the first image of the next frame. The second position information is more comprehensive than the first position information, the pertinence to the first image of each frame is strong, and the determination efficiency is high, so that the second position information is output, and the tracking of the object to be scanned is more convenient.

The second pose information and the first pose information do not accord with the first condition, and can indicate that the second pose information is inconsistent with the second pose information, and at least one of the two pose information is an invalid pose, so that the second pose information cannot be output as an effective pose, namely, the first image of the frame does not obtain the effective pose, and meanwhile, the first pose information cannot be continuously used for determining the second pose information of the first image of the next frame, namely, the first pose information needs to be updated, and at this moment, the first pose information can be determined to be invalid. And updating the first position information, namely, acquiring the second image again, determining the first position information again by using the second image acquired again, and deleting the original first position information.

In addition, after the second pose information is output, a corresponding augmented reality rendering effect can be presented according to the second pose information.

According to the embodiment, a first image obtained by scanning an object to be scanned and a spatial model of the object to be scanned are obtained through electronic equipment, a second image is obtained in response to the fact that first position and posture information is missing or invalid, the first position and posture information is determined according to the second image and the spatial model, second position and posture information is determined according to the first image, the spatial model and the first position and posture information, finally, the second position and posture information is output in response to the fact that the second position and posture information and the first position and posture information meet a preset first condition, and otherwise, the first position and posture information is invalid. The first pose information is determined according to the second image obtained by scanning the object to be scanned by the electronic equipment and the space model, and after the first pose information is determined, the first pose information can be continuously used for determining the second pose information corresponding to a plurality of frames of first images, and the first pose information is updated once until the second pose information and the first pose information do not accord with the first condition, so that the efficiency and the accuracy of obtaining the pose information can be improved, and the efficiency and the accuracy of identifying the three-dimensional object by using an augmented reality technology can be improved.

In some embodiments of the application, the first pose information may be determined from the second image and the spatial model in the following manner: firstly, at least one image frame corresponding to the second image in the space model is obtained, and first matching information between the feature points of the second image and the feature points of the at least one image frame is determined (since the feature points of the second image and the image frame are two-dimensional points, the first matching information is 2D-2D matching); then, acquiring a point cloud corresponding to the at least one image frame in the spatial model, and determining second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information (since the feature point of the second image is a two-dimensional point, the second matching information is 2D-3D matching); and finally, determining the first attitude information according to the first matching information and the second matching information.

When at least one image frame corresponding to the second image in the space model is obtained: the similarity between each image frame in the spatial model and the second image may be determined first, and then the image frame with the similarity higher than a preset similarity threshold with the second image may be determined as the image frame corresponding to the second image. The similarity threshold is preset in advance, the higher the threshold is, the fewer the image frames corresponding to the second image are screened out, and the lower the threshold is, the more the image frames corresponding to the second image are screened out. The pose information of the image frame corresponding to the second image is the same as or similar to the pose information of the second image. In one example, when determining the similarity between the image frame and the second image, the similarity may be obtained by calculating euclidean distances between the feature points of the image frame and the feature points of the second image and then according to the euclidean distances.

Optionally, the image frames in the spatial model may be converted into image retrieval information, and sufficient feature points of the second image are extracted, so that the image frames with similarity higher than the similarity threshold with the second image are found by using an image retrieval method. The descriptors of all image frames may be clustered layer by a clustering algorithm (e.g., k-means algorithm) to obtain image retrieval information representing the word composition of the descriptors. The image retrieval mode is to determine a condition that the similarity of the feature points of the second image exceeds a similarity threshold, traverse each piece of information in the image retrieval information by using the condition, screen out information meeting the condition, and use an image frame corresponding to the screened information as an image frame with the similarity higher than the similarity threshold with the second image.

Wherein, when determining first matching information between the feature points of the second image and the feature points of the at least one image frame: the feature points and descriptors of the second image and the feature points and descriptors of the image frame may be obtained first; determining initial matching information between the feature points of the second image and the feature points of the image frame according to the descriptor of the second image and the descriptor of the image frame; then determining a basic matrix and/or an essential matrix of the second image and the image frame according to the initial matching information; and finally, filtering the initial matching information according to the basic matrix and/or the essential matrix to obtain the first matching information.

Optionally, when the initial matching information is determined, a descriptor with a closest hamming distance may be found in the image frame for each descriptor in the second image, and then the other way around, a descriptor with a closest hamming distance may be found in the second image for each descriptor in the image frame, if a certain descriptor in the second image and a certain descriptor in the image frame are each other with a closest hamming distance, the two descriptors are considered to be matched, and then it is determined that two feature points corresponding to the two descriptors are matched, and all the mutually matched feature points constitute the initial matching information.

Alternatively, the basis matrix and/or the essential matrix may be calculated by a Random Sample Consensus (RANSAC) algorithm. Preferably, a plurality of basis matrixes and/or essential matrixes can be calculated through RANSAC and a 5-point algorithm, interior points of each basis matrix and/or essential matrix are determined, and then the basis matrix and/or essential matrix with the largest number of interior points is determined as a final calculation result. If the two feature points which are matched with each other are in accordance with the basic matrix and/or the essential matrix, the two feature points are interior points; conversely, if two feature points that match each other do not match the basis matrix and/or the essential matrix, then the two feature points are outliers. When the initial matching information is filtered by using the basic matrix and/or the essential matrix, the interior points in the initial matching information are also reserved, namely, the exterior points in the initial matching information are deleted.

When second matching information between the feature point of the second image and the three-dimensional point of the point cloud is determined according to the first matching information: the feature points of the second image matched with the feature points of the image frame may be matched with the three-dimensional points of the point cloud corresponding to the feature points of the image frame to obtain the second matching information. That is, the feature points of the second image are matched with the three-dimensional points of the point cloud through the feature points of the image frame as a medium.

Wherein, when determining the first pose information according to the first matching information and the second matching information: the gravity acceleration of the electronic equipment can be obtained firstly; and determining the first attitude information according to the first matching information, the second matching information and the gravity acceleration.

Alternatively, the electronic device may have an acceleration sensor and/or a gyroscope, etc., and thus the gravitational acceleration may be acquired from the acceleration sensor and/or the gyroscope, etc. In computer vision, the first position information can be solved by using the first matching information through a PnP (peer-n-point) algorithm, and the first position information can be solved by using the second matching information through decomposing a base matrix and/or an essential matrix. In both the above two solving processes, a constraint condition of the gravity acceleration can be added, that is, the gravity acceleration is used to constrain a rotation angle (such as a roll angle and a pitch angle) in the pose of the electronic device. Then, the two solving processes can be integrated in a Hybrid form to solve the first attitude information, namely, the first position and attitude information is solved by comprehensively utilizing the first matching information, the second matching information and the gravity acceleration, six different degrees of freedom are needed in the solving process, the first matching information can provide the constraint of 1 degree of freedom, the second matching information can provide the constraint of 2 degrees of freedom, the gravity acceleration provides 1 degree of freedom, a certain amount of first matching information, a certain amount of second matching information and the gravity acceleration can be randomly selected and combined to form six degrees of freedom to solve the first attitude information, the equation can be constructed by the first matching information through a Procko coordinate system relation, the equation can be constructed by the first matching information through a camera projection matrix model, and then a solver (for example, Grobner Basis Solution) is used for solving the multiple equations which are combined; or the two solving processes are independently utilized in a RANSAC mode respectively, the first position and attitude information is solved in a robust mode, namely, according to different times proportions, the first matching information and the gravity acceleration are alternately selected in sequence to solve the first position and attitude information, the second matching information and the gravity acceleration are solved to solve the first position and attitude information, the solved first position and attitude information and all matching information are subjected to error calculation, when the number of the inner points is large enough (for example, exceeds a certain threshold value), the first position and attitude information at the moment is determined to be accurate, and the solution is finished.

Due to the fact that the constraint condition of the gravity acceleration is added, the first matching information (2D-2D matching) and the second matching information (2D-3D matching) are integrated, the obtained first position and posture information is accurate, and the second position and posture information obtained based on the first position and posture information is accurate.

In the above embodiments, the first pose information may be determined by a detector or detection module for use by a tracker or tracking module.

In some embodiments of the present application, the second pose information may be determined from the first image, the spatial model, and the first pose information as follows: firstly, determining third pose information corresponding to the first image according to the first pose information and the first image, wherein the third pose information is pose information of the electronic equipment relative to the object to be scanned; next, according to the third pose information, determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model (since the feature point of the first image is a two-dimensional point, the third matching information is 2D-3D matching); next, in response to that the third matching information meets a preset second condition, according to the third pose information, determining fourth matching information between the feature points of the first image and the feature points of at least one image frame of the space model (since the feature points of the first image and the image frame are two-dimensional points, the fourth matching information is 2D-3D matching); and finally, determining the second posture information according to the third matching information and the fourth matching information.

Wherein, the first position information may include fourth position information, and the fourth position information is coordinate information (T) of the object to be scanned in the world coordinate system_ow). When the position of the object to be scanned is stationary, then the fourth pose information remains unchanged. Based on this, when determining the third pose information corresponding to the first image according to the first pose information and the first image: fifth pose information may be obtained from a positioning module according to the first image, where the fifth pose information is pose information (T) of the electronic device in a world coordinate system_cw) (ii) a And determining the third pose information according to the fourth pose information and the fifth pose information.

Optionally, the positioning module may be a Visual Inertial synchronous positioning and Mapping (VISLAM) module, and the VISLAM module may output pose information of the electronic device in a world coordinate system in real time during an operation process. The pose information of the object to be scanned in the world coordinate system is the absolute pose of the object to be scanned, and the pose information of the electronic equipment in the world coordinate system is the absolute pose of the electronic equipment, so that the relative poses of the object to be scanned and the electronic equipment, namely the pose information (T) of the electronic equipment relative to the object to be scanned can be determined through the absolute poses of the object to be scanned and the electronic equipment in a unified coordinate system_co) Or pose information (T) of the object to be scanned relative to the electronic device_oc) Selecting the pose information (T) of the electronic device relative to the object to be scanned_co) As the third pose information, it is of course possible to select the pose information (T) of the object to be scanned with respect to the electronic device_oc) As third posture information.

When determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model according to the third pose information: the point cloud of the space model is projected onto the first image according to the third pose information to form a plurality of projection points, and a descriptor of each projection point is extracted; extracting feature points and descriptors of the first image frame; and finally, determining third matching information between the feature point and the three-dimensional point of the point cloud according to the descriptor corresponding to the feature point and the descriptor of the projection point.

The third pose information can represent the relative poses of the electronic equipment for shooting the first image and the object to be scanned, namely the direction and the angle of the electronic equipment and the object to be scanned can be represented, so that the point cloud projection can be mapped on the first image by utilizing the camera model.

Because the three-dimensional points of the point cloud can be obtained by matching and triangularizing the feature points of the image frames in the modeling process, each three-dimensional point of the point cloud corresponds to at least one feature point of the image frame, descriptors of all the feature points corresponding to one three-dimensional point are extracted, and descriptors of projection points of the three-dimensional point are obtained by fusing the descriptors.

Optionally, when determining the third matching information, a descriptor of a projection point with a closest hamming distance may be found for the descriptor of each feature point, and then a descriptor of a feature point with a closest hamming distance may be found for the descriptor of each projection point, if the descriptor of a certain feature point and the descriptor of a certain projection point are mutually descriptors with a closest hamming distance, the two descriptors are considered to be matched, and then the feature points and the three-dimensional points corresponding to the two descriptors are determined to be matched, and all the mutually matched feature points and three-dimensional points constitute the third matching information.

In this embodiment, the second condition may be that the number of matching combinations between the first image and the point cloud of the spatial model is greater than a preset number threshold. Wherein the matching combination comprises a pair of feature points and three-dimensional points which are matched with each other. The number of the matching combinations represents the effectiveness of the first attitude information to a certain extent, if the first attitude information is invalid, the number of the matching combinations is inevitably reduced or disappeared, and if the first attitude information is valid, the number of the matching combinations is inevitably larger. The second condition is a pre-determination step before the validity of the first position information is determined in step S104, if the third matching information does not meet the second condition, that is, the number of matching combinations is less than or equal to the preset number threshold, the first position information and the second position information do not meet the first condition, so that the subsequent step of solving the second position information is not needed, the first position information can be directly determined to be invalid, and if the third matching information meets the second condition, that is, the number of matching combinations is greater than the preset number threshold, it cannot be directly determined whether the first position information is valid, so that the second position information is continuously solved, and the validity of the first position information is determined according to whether the first position information and the second position information meet the first condition.

Based on this, when fourth matching information between the feature point of the first image and the feature point of at least one image frame of the space model is determined according to the third pose information, at least one image frame matching the third pose information may be determined according to the third pose information and pose information of each image frame of the space model; then obtaining the feature points and descriptors of the first image and the feature points and descriptors of each image frame matched with the third pose information; and finally, according to the descriptor of the first image and the descriptor of the image frame, determining fourth matching information between the feature point of the first image and the feature point of the image frame.

Each image frame has pose information (e.g., sixth pose information, described below) that characterizes a relative pose of the electronic device that acquired the image frame and the object to be scanned, i.e., the image frame may be acquired while the electronic device is in the relative pose; and the third pose information represents the relative pose of the electronic equipment for acquiring the first image and the object to be scanned, namely the electronic equipment can acquire the first image when the electronic equipment is in the relative pose. When the pose information of a certain image frame is the same as or similar to the pose information of a certain first image (for example, the angle difference is within a preset range), it may be determined that the image frame matches the first image.

When the fourth matching information is determined, a descriptor with a closest hamming distance may be searched in the image frame for each descriptor in the first image, and then a descriptor with a closest hamming distance may be searched in the first image for each descriptor in the image frame, if a certain descriptor in the first image and a certain descriptor in the image frame are the descriptors with the closest hamming distance, the two descriptors are considered to be matched, and then it is determined that two feature points corresponding to the two descriptors are matched, and all the mutually matched feature points constitute the fourth matching information.

When the second posture information is determined according to the third matching information and the fourth matching information, the gravity acceleration of the electronic device may be obtained first; and determining the second posture information according to the third matching information, the fourth matching information and the gravity acceleration.

Alternatively, the electronic device may have an acceleration sensor and/or a gyroscope, etc., and thus the gravitational acceleration may be acquired from the acceleration sensor and/or the gyroscope, etc. In computer vision, the second pose information can be solved by using the fourth matching information through a PnP algorithm, and the second pose information can be solved by using the third matching information through a decomposition basis matrix and/or essential matrix algorithm. In both the above two solving processes, a constraint condition of the gravity acceleration can be added, that is, the gravity acceleration is used to constrain a rotation angle (such as a roll angle and a pitch angle) in the pose of the electronic device. Then, the two solving processes can be integrated in a Hybrid form to solve the second posture information, namely, the third matching information, the fourth matching information and the gravity acceleration are comprehensively utilized to solve the second attitude information, six different degrees of freedom are needed in the solving process, the first matching information can provide the constraint of 1 degree of freedom, the second matching information can provide the constraint of 2 degrees of freedom, the gravity acceleration provides 1 degree of freedom, a certain amount of third matching information, a certain amount of fourth matching information and the gravity acceleration can be randomly selected and combined to form six degrees of freedom to solve the second attitude information, the fourth matching information can be used for establishing an equation through a Pluck coordinate system relation, the third matching information can be used for establishing an equation through a camera projection matrix model, and then a solver (such as a Grobner Basis Solution) is used for solving the multiple equations which are combined; or the two solving processes are independently utilized in a RANSAC mode respectively, the second posture information is solved in a robust mode, namely according to different times proportions, the third matching information and the gravity acceleration are alternately selected in sequence to solve the second posture information, the fourth matching information and the gravity acceleration are solved to solve the second posture information, the solved second posture information and all the matching information are subjected to error calculation, when the number of the inner points is large enough (for example, exceeds a certain threshold value), the second posture information at the moment is determined to be accurate, and the solution is finished.

In the above embodiments, the tracker or tracking module may be used to determine the second position information, and the detector or detection module is used to obtain the first position information during the determination process. Because the accuracy of the first position and posture information determined by the detector or the detection module is higher than that of the tracker or the tracking module, and the efficiency is lower than that of the tracker, the first position and posture information (which can be recycled) is determined by the detector or the detection module, the second position and posture information is frequently output by the tracker or the tracking module, the tracking starting point of the tracker can be determined by the detector or the detection module, the position and posture acquisition accuracy is improved, the complicated operation and tracking inaccuracy caused by manually aligning the space model and the object to be scanned are avoided, and the position and posture acquisition efficiency can be ensured.

In some embodiments of the present disclosure, the spatial model of the object to be scanned may be obtained as follows: firstly, acquiring a multi-frame modeling image obtained by scanning an electronic device aiming at an object to be scanned, and synchronously acquiring sixth pose information corresponding to each frame modeling image; matching the characteristic points of the multi-frame modeling image, and triangulating the characteristic points according to a matching result to form a point cloud; next, determining at least one image frame from the multi-frame modeling image, and determining a point cloud corresponding to each image frame; and finally, constructing the at least one image frame, the sixth pose information corresponding to each image frame and the point cloud into a spatial model.

In the process of feature matching, an inter-frame descriptor matching method or an optical flow tracking matching method can be adopted. In the triangularization process, a certain landmark position in a three-dimensional space can be tracked between continuous frames through matching between two frames, an equation set can be constructed through the matching relation between the continuous frames and the pose information of each frame, and the depth information of the landmark position can be obtained through solving the equation set.

The frequency of the electronic equipment during scanning the modeling image is high (for example, the frequency of 30Hz is adopted), and only part of the modeling image can be selected during selecting the image frame, so that the file volume of the whole model is not too large, the subsequent file sharing is facilitated, and the memory consumption of the model during the operation at a mobile phone end can be reduced.

In one example, the spatial model is obtained as shown in fig. 3, the spatial model includes a point cloud in a three-dimensional frame and a modeling image frame, and each image frame is labeled with sixth pose information. The sixth pose information may be pose information of the electronic device relative to the object to be scanned, and the sixth pose information may be obtained by acquiring pose information of the electronic device in a world coordinate system from a positioning module, such as a VISLAM module, in the electronic device, and then combining the pose information with pose information of the object to be scanned in the world coordinate system, which is acquired in advance.

In one embodiment, the terminal device may scan the product by using the pose information acquisition method provided by the present application. The product is attached with certain product description and effect display, a terminal device can be adopted to start a scanning program, the program can operate the pose acquisition method provided by the application, so that first pose information can be obtained and second pose information can be output when the terminal device scans the product, and when the second pose information is output, the program can display corresponding product description and/or effect display on a display screen of the terminal device by using a reality enhancement technology according to the mapping effect of the second pose information and the product description and/or the effect display. For example, when the product is a refrigerator, the explanation and/or the display effect of the interaction process can be presented by using the display enhancement technology when the second posture information is that the terminal device is just facing the human-computer interaction interface of the refrigerator.

According to a second aspect of the embodiments of the present application, there is provided a pose acquisition apparatus, please refer to fig. 4, which shows a schematic structural diagram of the apparatus, including:

an obtaining module 401, configured to obtain a first image and a spatial model of an object to be scanned, where the first image is an image obtained by scanning, by an electronic device, the object to be scanned;

a first pose module 402, configured to, in response to that first pose information is missing or invalid, acquire a second image, and determine the first pose information according to the second image and the spatial model, where the second image is an image obtained by scanning the object to be scanned by the electronic device, and the first pose information is pose information of the electronic device and/or the object to be scanned;

a second pose module 403, configured to determine second pose information according to the first image, the spatial model, and the first pose information, where the second pose information is pose information of the electronic device and/or the object to be scanned;

an output module 404, configured to output the second posture information in response to that the second posture information and the first posture information meet a preset first condition; otherwise, determining that the first pose information is invalid in some embodiments of the present disclosure, the first pose module is specifically configured to:

In some embodiments of the disclosure, when the first pose module is configured to obtain at least one image frame of the spatial model corresponding to the second image, the first pose module is specifically configured to:

In some embodiments of the disclosure, when the first pose module is configured to determine first matching information between the feature point of the second image and the feature point of the at least one image frame, the first pose module is specifically configured to:

In some embodiments of the disclosure, the first pose module, when determining second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information, is specifically configured to:

In some embodiments of the present disclosure, the first pose module, when determining the first pose information according to the first matching information and the second matching information, is specifically configured to:

acquiring the gravity acceleration of the electronic equipment;

In some embodiments of the present disclosure, the second pose module is specifically configured to:

In some embodiments of the present disclosure, the first pose information includes fourth pose information, where the fourth pose information is pose information of the object to be scanned in a world coordinate system;

In some embodiments of the disclosure, the second pose module, when determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the space model according to the third pose information, is specifically configured to:

extracting feature points and descriptors of the first image frame;

In some embodiments of the disclosure, the second pose module, when determining fourth matching information between the feature point of the first image and the feature point of the at least one image frame of the spatial model according to the third pose information, is specifically configured to:

and determining fourth matching information between the characteristic points of the first image and the characteristic points of the image frame according to the descriptor of the first image and the descriptor of the image frame.

In some embodiments of the present disclosure, the second pose module is configured to, when determining the second pose information according to the third matching information and the fourth matching information, specifically:

acquiring the gravity acceleration of the electronic equipment;

In some embodiments of the present disclosure, the second posture information and the first posture information meet a preset first condition, including:

In some embodiments of the present disclosure, when the obtaining module is configured to obtain a spatial model of an object to be scanned, the obtaining module is specifically configured to:

With regard to the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the third aspect with respect to the embodiment of the method, and will not be elaborated here.

In a third aspect, at least one embodiment of the present application provides an electronic device, please refer to fig. 5, which illustrates a structure of the electronic device, where the electronic device includes a memory for storing computer instructions executable on a processor, and the processor is configured to obtain pose information based on the method according to any one of the first aspect when executing the computer instructions.

In a fourth aspect, at least one embodiment of the present application provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the method of any one of the first aspect. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

In a fifth aspect, at least one embodiment of the present application provides a computer program product comprising computer readable code which, when run on a device, a processor in the device executes instructions for implementing the method according to any one of the first aspect.

In this application, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A pose acquisition method is characterized by comprising:

responding to the second position and posture information and the first position and posture information meeting a preset first condition, and outputting the second position and posture information; otherwise, determining that the first attitude information is invalid;

the determining the first pose information according to the second image and the spatial model comprises:

acquiring at least one image frame corresponding to the second image in the space model, and determining first matching information between the feature points of the second image and the feature points of the at least one image frame;

determining the first attitude information according to the first matching information and the second matching information;

determining second pose information according to the first image, the spatial model and the first pose information includes:

responding to the third matching information meeting a preset second condition, and determining fourth matching information between the feature point of the first image and the feature point of at least one image frame of the space model according to the third pose information;

2. The pose acquisition method according to claim 1, wherein the acquiring at least one image frame corresponding to the second image in the spatial model includes:

3. The pose acquisition method according to claim 1, wherein the determining first matching information between the feature point of the second image and the feature point of the at least one image frame includes:

4. The pose acquisition method according to claim 1, wherein the determining second matching information between the feature point of the second image and the three-dimensional point of the point cloud according to the first matching information includes:

5. The pose acquisition method according to claim 1, wherein the determining the first pose information according to the first matching information and the second matching information includes:

acquiring the gravity acceleration of the electronic equipment;

6. The pose acquisition method according to claim 1, wherein the first pose information includes fourth pose information, wherein the fourth pose information is pose information of the object to be scanned in a world coordinate system;

7. The pose acquisition method according to claim 1, wherein the determining third matching information between the feature point of the first image and the three-dimensional point of the point cloud of the spatial model according to the third pose information includes:

extracting feature points and descriptors of the first image frame;

8. The pose acquisition method according to claim 1, wherein the determining fourth matching information between the feature point of the first image and the feature point of at least one image frame of the spatial model according to the third pose information includes:

9. The pose acquisition method according to claim 1, wherein the determining the second pose information according to the third matching information and the fourth matching information includes:

acquiring the gravity acceleration of the electronic equipment;

10. The pose acquisition method according to claim 1, wherein the second pose information and the first pose information meet a preset first condition, and the method comprises:

11. The pose acquisition method according to any one of claims 1 to 10, wherein the acquiring a spatial model of an object to be scanned includes:

12. A pose acquisition apparatus characterized by comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first image and a space model of an object to be scanned, and the first image is an image obtained by scanning the object to be scanned by electronic equipment;

the output module is used for responding to the second position and posture information and the first position and posture information meeting a preset first condition and outputting the second position and posture information; otherwise, determining that the first attitude information is invalid;

the first pose module, when configured to determine the first pose information according to the second image and the spatial model, is specifically configured to:

the second pose module is configured to, when determining second pose information according to the first image, the spatial model, and the first pose information, specifically:

13. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 11 when executing the computer instructions.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 11.