WO2024032101A1 - 特征地图生成方法、装置、存储介质和计算机设备 - Google Patents

特征地图生成方法、装置、存储介质和计算机设备 Download PDF

Info

Publication number
WO2024032101A1
WO2024032101A1 PCT/CN2023/097112 CN2023097112W WO2024032101A1 WO 2024032101 A1 WO2024032101 A1 WO 2024032101A1 CN 2023097112 W CN2023097112 W CN 2023097112W WO 2024032101 A1 WO2024032101 A1 WO 2024032101A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
feature point
feature points
target
Prior art date
Application number
PCT/CN2023/097112
Other languages
English (en)
French (fr)
Inventor
余长松
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024032101A1 publication Critical patent/WO2024032101A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space

Definitions

  • the present application relates to the field of computer technology, and in particular to a feature map generation method, device, computer equipment, storage medium and computer program product.
  • Feature maps can be constructed.
  • Feature maps are a data structure that can represent the observation environment with relevant geometric features (such as points, straight lines, surfaces). , thereby assisting the positioning of the sports equipment to be positioned.
  • relevant geometric features such as points, straight lines, surfaces.
  • autonomous vehicles can be positioned by constructing feature maps.
  • a feature map generation method is provided.
  • the present application provides a method for generating a feature map, which is executed by a computer and includes: obtaining multiple frames of images taken against a target scene, extracting image feature points from each frame image, and based on the extracted image feature points Determine the corresponding feature descriptor at the position of the corresponding image; combine the image feature points with matching relationships among the image feature points of each frame image to form a feature point set; determine the representative feature point from the feature point set, and calculate the feature The difference between the feature descriptors corresponding to the remaining image feature points in the point set and the feature descriptors corresponding to the representative feature points; determine the position error of the feature point set based on the calculated difference, and iterate based on the position error Update the remaining image feature points in the feature point set, and when the iteration stop condition is met, obtain the updated feature point set; and based on the position of each image feature point in the corresponding image in the updated feature point set, determine The spatial feature points corresponding to the updated feature point set are used to generate a feature map
  • this application also provides a feature map generating device.
  • the device includes: a feature extraction module, used to obtain multiple frames of images captured for the target scene, extract image feature points from each frame of the image, and extract image feature points based on the proposed method.
  • the obtained image feature points determine the corresponding feature descriptor at the position of the corresponding image;
  • the feature point set determination module is used to combine the image feature points with a matching relationship among the image feature points of each frame image to form a feature point set;
  • the difference The calculation module determines representative feature points from the feature point set, and calculates the difference between the feature descriptors corresponding to the remaining image feature points in the feature point set and the feature descriptors corresponding to the representative feature points;
  • position update A module configured to determine the position error of the feature point set based on the calculated difference, iteratively update the remaining image feature points in the feature point set based on the position error, and obtain the updated feature when the iteration stop condition is met.
  • a feature map generation module configured to determine the spatial feature points corresponding to the updated feature point set based on the position of each image feature point in the corresponding image in the updated feature point set, based on the space
  • the feature points generate a feature map, which is used to locate the sports equipment to be positioned in the target scene.
  • this application also provides a computer device.
  • the computer device includes a memory and a processor.
  • the memory stores computer readable instructions.
  • the processor executes the computer program, the steps of the above feature map generation method are implemented.
  • this application also provides a computer-readable storage medium.
  • the computer-readable storage medium has computer-readable instructions stored thereon, and when the computer program is executed by the processor, the steps of the above-mentioned feature map generation method are implemented.
  • the computer program product includes a computer program that implements the steps of the feature map generating method when executed by a processor.
  • Figure 1 is an application environment diagram of the feature map generation method in one embodiment
  • Figure 2 is a schematic flowchart of a feature map generation method in one embodiment
  • Figure 3 is a schematic diagram of the composition of a feature point set in an embodiment
  • Figure 4 is a schematic flowchart of generating a feature map based on spatial feature points in one embodiment
  • Figure 5 is a schematic diagram of determining the corresponding position in the input image in one embodiment
  • Figure 6 is a schematic structural diagram of a feature extraction model in one embodiment
  • Figure 7 is a schematic flowchart of the steps of determining positioning information in one embodiment
  • Figure 8 is a structural block diagram of a feature map generating device in one embodiment
  • Figure 9 is an internal structure diagram of a computer device in one embodiment
  • Figure 10 is an internal structural diagram of a computer device in another embodiment.
  • the feature map generation method provided by the embodiment of the present application can be applied to intelligent transportation systems (Intelligent Traffic System, ITS) and intelligent vehicle-road cooperative systems (Intelligent Vehicle Infrastructure Cooperative Systems, IVICS).
  • intelligent transportation systems Intelligent Traffic System, ITS
  • intelligent vehicle-road cooperative systems Intelligent Vehicle Infrastructure Cooperative Systems, IVICS.
  • the feature map generation method provided by the embodiment of the present application can be applied in the application environment as shown in Figure 1.
  • the sports equipment 102 communicates with the server 104 through the network.
  • the sports equipment 102 refers to one of the equipment that can move autonomously or the equipment that moves passively.
  • the equipment that moves autonomously can be various vehicles, robots, etc.
  • the equipment that moves passively can, for example, be carried by the user and Terminals that move following the movement of the user and passively moving devices may be, for example, smartphones, tablets, and portable wearable devices.
  • Photography equipment is installed on the sports equipment 102 .
  • the server 104 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services.
  • the shooting device on any sports device can shoot the target scene to obtain multiple frame images and send the captured multiple frame images to the server.
  • the server generates the feature map based on each frame image and saves it.
  • the sports equipment to be positioned can send inertial measurement data, speed measurement data and target images captured in the target scene to the server.
  • the server can determine the location to be positioned based on these data and the saved feature map.
  • the positioning information of the sports equipment is sent to the sports equipment to be positioned.
  • the shooting device on the moving device can shoot the target scene to obtain multiple frames of images, and then generate and save a feature map based on each frame of the image. , so that when the moving device moves in the target scene again, the positioning information can be determined based on the saved feature map.
  • the feature map generated by the sports equipment can also be sent to the server.
  • the feature map can be downloaded, and the positioning information or other movements to be positioned can be determined based on the downloaded feature map.
  • the device moves in the sports scene, it can send inertial measurement data, speed measurement data and target images captured in the target scene to the server.
  • the server can determine the sports equipment to be positioned based on these data and the saved feature map.
  • the positioning information is returned to the sports equipment to be positioned.
  • a feature map generation method is provided.
  • the method is executed by a computer device. Specifically, it can be executed individually by a computer device such as a sports device or a server in Figure 1, or by a sports device. It is executed in cooperation with the server.
  • the method is applied to the server in Figure 1 as an example to illustrate, including the following steps:
  • Step 202 Obtain multiple frames of images captured for the target scene, extract image feature points from each frame of image, and determine the corresponding feature descriptor based on the extracted image feature points at the location of the corresponding image.
  • the target scene refers to the specific scene for which the feature map to be generated is targeted.
  • the target scene can be the environment in which the vehicle is located.
  • the target scene can be a scene determined by the possible driving route of the vehicle.
  • the vehicle is in this scene.
  • Image feature points are certain pixel points on the image that can be used to describe the characteristics of the scene, such as significant edge points, directional gradient histogram features, Haar features, etc.
  • the feature descriptor has a one-to-one correspondence with the image feature points.
  • the feature descriptor is a representation of the statistical results of the Gaussian image gradient in the neighborhood near the feature point.
  • the feature descriptor can be used to describe the corresponding image feature points.
  • the sports equipment can collect multiple frames of images and transmit them to the server in real time for processing, or the sports equipment can only be responsible for storing the collected multi-frame images, and after the image collection is completed, the stored multi-frame images are stored in some way.
  • the server After the server obtains multiple frames of images captured for the target scene, it can extract image feature points from each frame of the image.
  • the sports device For each image feature point, the sports device can determine the location of the image feature point based on the target image feature point. The feature descriptors corresponding to the image feature points can be obtained, so that the respective image feature points of each frame image and the respective feature descriptors of each image feature point can be obtained.
  • image feature points can be extracted using, but are not limited to, algorithms such as Good Features to Track. Corresponding functions are provided in the computer vision library OpenCV.
  • feature points of the image can also be extracted by training a machine learning model.
  • the machine learning model includes multiple convolutional layers. Each convolutional layer performs different processing on the original image and then outputs a feature image.
  • the feature image The above represents the possibility of each position in the original image being a feature point, and then the original feature point can be determined based on the feature image. It can be understood that multiple image feature points can be extracted from each frame of image. Multiple means at least two.
  • Step 204 The image feature points with matching relationships among the image feature points of each frame image are formed into a feature point set.
  • image feature points with matching relationships refer to similar image feature points.
  • image feature points with a matching relationship can be judged by their feature descriptors. When the feature descriptors of two image feature points reach a certain degree of approximation, they are considered to be matched.
  • the server can divide each image feature point in each frame of image into a set according to the matching relationship between each other, so as to obtain multiple feature point sets.
  • each image belonging to the same feature point set can be divided into sets.
  • Feature points have a matching relationship with each other. For example, as shown in Figure 3, assume that there are 3 frames of images in total.
  • the first frame of image includes image feature points A1, A2 and A3.
  • the second frame of image includes image feature points of B1, B2, B3 and B4.
  • the third frame of image includes image feature points A1, A2 and A3.
  • A1, B1 and C1 are image feature points that have a matching relationship with each other
  • A2, B2 and C2 are image feature points that have a matching relationship with each other
  • A3, B3 and C3 is an image feature point that has a matching relationship with each other. Then A1, B1 and C1 can form the feature point set 1, A2, B2 and C2 can form the feature point set 2, and A3, B3 and C3 can form the feature point set.
  • N image feature points on the i-th frame image are first extracted.
  • each feature point set includes at least one image feature point, or includes an image feature point that has a matching relationship with each other. sequence. It is understandable that in practical applications, if the feature map is constructed in real time, M may not be known, but the specific steps are similar to the above. You only need to keep increasing i until all images are processed.
  • Step 206 Determine representative feature points from the feature point set, and calculate the difference between the feature descriptors corresponding to the remaining image feature points in the feature point set and the feature descriptors corresponding to the representative feature points.
  • the representative feature point set refers to the image feature points in the feature point set that can represent the feature point set.
  • the remaining image feature points in the feature point set refer to the image feature points in the feature point set except the representative feature points.
  • a feature point set includes a total of four image feature points A1, B1, C1 and D1, where A1 is the representative feature point, and B1, C1 and D1 are the remaining image feature points.
  • the server may randomly select an image feature point from each feature point set as the representative feature point of each feature point set.
  • the server may calculate the average feature point of each feature point set, and determine the image feature point closest to the respective average feature point in each feature point set as the representative feature point.
  • a representative feature point can be determined in each feature point set.
  • the server can calculate the absolute difference between the feature descriptor corresponding to each remaining image feature point in the feature point set and the feature descriptor corresponding to the feature point set, and obtain The corresponding differences of each remaining image feature point.
  • the server may calculate the square of the absolute difference to obtain the corresponding difference of each remaining image feature point.
  • Step 208 Determine the position error of the feature point set based on the calculated difference, and iteratively update the remaining image feature points in the feature point set based on the position error. When the iteration stop condition is met, the updated feature point set is obtained.
  • the iteration stop condition may be, for example, one of the following: the position error reaches a minimum value, the number of iterations reaches a preset number, or the iteration duration reaches a minimum duration.
  • each feature point set determines a spatial feature point. In order to improve the accuracy of the determined spatial feature points, it is necessary to reduce the overall position error of the feature point set. Based on this, this embodiment middle, For each feature point set, the server can count the differences corresponding to the remaining image feature points in the feature point set, determine the position error of the feature point set based on the statistically obtained differences, and iteratively update the representative in the direction of minimizing the position error. Each update of the position of each image feature point other than the feature point is equivalent to optimizing the position of the image feature point. Based on the feature descriptor corresponding to the optimized image feature point, the position error is recalculated and the next update is performed.
  • each updated image feature point and representative feature point belonging to the same feature point set will form an updated feature point set.
  • the gradient descent algorithm can be used to update the positions of image feature points.
  • the server in order to prevent degradation during the optimization process, can calculate the singular values of the Hessian matrix for the feature point set, and if the largest singular value divided by the smallest singular value is greater than a preset threshold, the update is given up.
  • Step 210 Based on the position of each image feature point in the updated feature point set in the corresponding image, determine the spatial feature points corresponding to the updated feature point set, and generate a feature map based on the spatial feature points.
  • the feature map is used for the motion to be positioned.
  • the device is positioned in the target scene.
  • the spatial feature points refer to the three-dimensional feature points, that is, the corresponding points of the feature points on the image in the three-dimensional space.
  • the feature map in this embodiment may be a data structure including multiple spatial feature points, and its specific form is not limited.
  • the sports equipment to be positioned refers to the sports equipment that needs to be positioned.
  • the sports equipment to be positioned and the sports equipment that sends the multi-frame images may be the same sports equipment, or they may be different sports equipment.
  • the pose of the image to which the image feature point belongs refers to the pose of the camera when the image of the frame was captured. This pose can be based on the pose of the moving equipment at the same time and the relative pose relationship between the camera and the moving equipment through pose transformation. And get.
  • the server can perform triangulation calculation based on the position of each image feature point in the updated feature point set in the corresponding image and the pose of the corresponding image, and obtain the respective values of each feature point set. corresponding spatial feature points. Further, the server can generate a feature map based on each spatial feature point, store the feature map, and then in the subsequent positioning process, the feature map can be used to assist in positioning the sports equipment to be positioned.
  • triangulation calculation is an existing method for mapping two-dimensional image feature points to three-dimensional spatial feature points, which will not be elaborated here. It can be understood that the descriptor of the spatial feature point may be the average of the descriptors of all image feature points that generate the spatial feature point.
  • the server can specifically determine the pose of the image to which the image feature point belongs through the following steps: first, obtain the relative pose between the sports equipment and the camera, which is usually maintained during the movement of the sports equipment. Invariant, it can be obtained through calibration. Then, the position and orientation of the sports equipment at each moment is determined based on the inertial measurement data and speed measurement data uploaded by the sports equipment. After that, the posture of the moving equipment at each moment is aligned with the collection time of the multi-frame image. The alignment here refers to determining the posture of the moving equipment corresponding to each frame of image.
  • the data collection time corresponding to the posture (collection inertia
  • the time of measurement data and speed measurement data) is the same as the acquisition time of the frame image (or the same within the error tolerance).
  • the posture of the frame image can be obtained by performing posture transformation based on the posture of the sports equipment corresponding to each frame of image and the relative posture between the movement equipment and the camera.
  • the above feature map generation method by obtaining multiple frames of images taken for the target scene, from each frame of image Extract image feature points respectively, determine the corresponding feature descriptor based on the location of the extracted image feature points in the image to which they belong, and combine the image feature points with matching relationships among the image feature points of each frame image to form a feature point set. From the feature points Determine the representative feature points in the set, calculate the difference between the feature descriptors corresponding to the remaining image feature points in the feature point set and the feature descriptors corresponding to the representative feature points, and determine the position error of the feature point set based on the calculated difference. The position error iteratively updates the remaining image feature points in the feature point set.
  • the updated feature point set is obtained; based on the position of each image feature point in the corresponding image in the updated feature point set, the updated feature point set is determined.
  • the spatial feature points corresponding to the feature point set are generated, and the feature map is generated based on the spatial feature points. Since in the process of generating the feature map, the position of the image feature points is optimized based on the feature descriptor of the image feature points, so that the generated feature map can be It is more robust, and the positioning accuracy is greatly improved during the positioning process by using this feature map.
  • determining the position error of the feature point set based on the calculated difference includes: using each remaining image feature point in the feature point set as a target feature point, respectively calculating the difference between each target feature point and the representative feature point.
  • the matching confidence of each target feature point based on the corresponding matching confidence and difference of each target feature point, calculate the corresponding position error of each target feature point; count the corresponding position error of each target feature point to obtain the position error of the feature point set.
  • the matching confidence between the target feature point and the representative feature point is used to characterize the matching degree between the target feature point and the representative feature point.
  • the server can use each remaining image feature point in the feature point set as a target feature point. For each target feature point, the server can calculate the distance between the target feature point and the representative feature point. The matching confidence is then multiplied by the matching confidence and the difference to obtain the position error corresponding to the target feature point. Finally, the position error corresponding to each target feature point is counted to obtain the position error of the feature point set.
  • the statistics can be one of summing, calculating the average value or calculating the median value.
  • the server can calculate the position error of the feature point set through the following formula (1), where j represents the jth feature point set, is the position error of the j-th feature point set, u and v represent image feature points, i(u) represents the u-th image feature point of the i-frame image, k(v) represents the v-th feature point of the k-th frame image , w uv is the matching confidence, p u represents the position of the image feature point u on the image, p v represents the position of the image feature point v on the image, F i(u) [p u ] represents the descriptor of p u , F k(v) [p v ] represents the descriptor of p v :
  • the position error of each image feature point can be made more accurate, and finally through statistical analysis of each feature point set
  • the position error of the feature point set obtained by the position error of the image feature points is more accurate, so that a higher-precision feature map can be obtained, further improving the positioning accuracy.
  • calculating the matching confidence between each target feature point and the representative feature point respectively includes: respectively obtaining the feature descriptor of each target feature point and obtaining the feature descriptor of the representative feature point; respectively calculating each The vector similarity between the respective feature descriptors of the target feature points and the feature descriptors representing the feature points is used as the matching confidence between the corresponding target feature points and the representative feature points.
  • vector similarity is used to describe the degree of similarity between two vectors.
  • the feature descriptor is in the form of a vector, so the vector similarity can be calculated.
  • the vector similarity may be, for example, cosine similarity.
  • the server can obtain the respective feature descriptors of each target feature point and obtain the feature descriptor representing the feature point, and then separately calculate the difference between the respective feature descriptor of each target feature point and the feature descriptor representing the feature point.
  • Vector similarity each vector similarity is used as the matching confidence of the corresponding target feature point. For example, assuming that a certain feature point set includes image feature points A1, B1, and C1, and C1 is the representative feature point, then the feature descriptors of A1, B1, and C1 can be obtained respectively, and the feature description of the image feature point A1 can be calculated.
  • the vector similarity between the sub-character and the representative feature point C1, as the matching confidence between the image feature point A1 and the representative feature point C1, calculates the vector similarity between the feature descriptor of the image feature point B1 and the representative feature point C1 , as the matching confidence between the image feature point B1 and the representative feature point C1.
  • the vector similarity between feature descriptors is calculated as the matching confidence. Since the feature descriptors describe image feature points, the obtained matching confidence is more accurate.
  • determining the representative feature point from the feature point set includes: based on the position of each image feature point in the feature point set in the corresponding image, calculating the average feature point position corresponding to the feature point set; The image feature points whose distance from the average feature point position satisfies the distance condition are determined, and the determined image feature points are used as representative feature points.
  • the distance condition includes one of the following: the distance to the average feature point position is less than or equal to the distance threshold, or the sorting is before the sorting threshold when sorted in ascending order by the distance to the average feature point position.
  • the server can obtain the position of each image feature point in the feature point set in the corresponding image, add the position values of the same dimension and then average it to obtain the target value of the dimension.
  • Each The target value of the dimension determines the average feature point position corresponding to the feature point set. For example, assume that a feature point set includes image feature points A1, B1 and C1, where the position of A1 in the corresponding image is (x1, y1), the position of B1 in the corresponding image is (x2, y2), and C1 The position in the corresponding image is (x3, y3), then the average feature point position corresponding to the feature point set is ((x1+x2+x3)/3, (y1+y2+y3)/3).
  • the server For each feature point set, after the server calculates the average feature point position corresponding to the feature point set, it can calculate the distance between the position of each image feature point in the feature point set and the average feature point position. According to the calculation, The distance is used to screen the image feature points that meet the distance condition, and the filtered image feature points are determined as representative feature points.
  • the distance condition includes that the distance to the average feature point position is less than or equal to the distance threshold, then the server calculates the average feature point position corresponding to each image feature point and the feature point set to which it belongs. After the distance between them, they are compared with the distance threshold respectively. If the distance between only one image feature point and the average feature point position corresponding to the feature point set to which it belongs is less than the distance threshold, then the image feature point is determined as a representative feature point. If the distance between multiple image feature points and the average feature point position corresponding to the feature point set is less than the distance threshold, one of these image feature points can be selected as the representative feature point. For example, the image feature with the smallest distance can be selected. points as representative feature points.
  • the distance condition includes sorting in ascending order according to the distance to the average feature point position before the sorting threshold, then the server calculates the average feature points corresponding to each image feature point and the feature point set to which it belongs. After the distance between positions, each image feature point can be sorted in ascending order according to the distance, and representative feature points can be selected from the image feature points sorted before the sorting threshold. For example, if the sorting threshold is 2, you can choose to sort the first image feature points as representative feature points.
  • the average feature point position corresponding to the feature point set is calculated, and the distance from the feature point set to the average feature point position is determined to satisfy the distance
  • the determined image feature points are used as representative feature points.
  • the determined representative feature points can better reflect the overall position characteristics of the feature point set.
  • the feature point set includes multiple feature point sets; determining representative feature points from the feature point set includes: for each feature point set, filtering the feature point set if the feature point set satisfies the filtering condition; When the feature point set does not meet the filtering conditions, enter the step of determining representative feature points from the feature point set.
  • the filtering condition includes at least one of the following: the distance between the initial spatial feature point calculated based on the feature point set and the camera capturing the multi-frame image is greater than the first preset distance threshold; the distance calculated based on the feature point set The distance between the initial spatial feature point and the shooting device of the multi-frame image is less than the second preset distance threshold, and the second preset distance threshold is less than the first preset distance threshold; the disparity calculated based on the feature point set is greater than the preset disparity threshold ; The average reprojection error calculated based on the feature point set is greater than the preset error threshold.
  • the initial spatial feature points refer to the spatial feature points determined based on the position of each image feature point in the image to which it belongs in the unupdated feature point set. Filtering a feature point set means removing the feature point set from multiple feature point sets.
  • the server can calculate the initial spatial feature point based on the targeted feature point set, and then can calculate the distance between the initial spatial feature point and the shooting device of the multi-frame image. , if the distance is greater than the first preset distance threshold, that is, the spatial feature point is too far from the shooting device, filter the targeted set of feature points; if the distance is smaller than the second preset distance threshold, that is, the spatial feature point is too close to the shooting device When, the targeted set of feature points is filtered, and the second preset distance threshold is smaller than the first preset distance threshold.
  • the server can also perform disparity calculation based on the targeted feature point set. If the calculated disparity is greater than the preset disparity threshold, the targeted feature point set will be filtered out. .
  • the server can also convert the feature point set based on The calculated initial spatial feature points are projected onto the image to which each image feature point in the feature point set belongs, the distance between each image feature point and the projected feature point projected onto the respective image is calculated, each projection distance is obtained, and then calculated The average value of the projection distance is used to obtain the average reprojection error. If the average reprojection error is greater than the preset error threshold, the feature point set is filtered out.
  • the filtering conditions in the filtering process may also be part of the above-mentioned conditions, and the order of filtering according to each filtering condition may not be limited to the above order.
  • the server can enter the above step "determine representative feature points from the feature point set” to determine representative feature points from these feature point sets, so as to determine these features through the method provided in the above embodiment.
  • the position of the image feature points of the point set is optimized to obtain each updated feature point set.
  • the space corresponding to each updated feature point set is determined.
  • Feature points multiple spatial feature points are obtained to generate feature maps.
  • filtering conditions are set to filter the set of feature points that meet the filtering conditions, thereby further improving the robustness of the feature map, thereby further improving the positioning accuracy when using the feature map for assisted positioning.
  • generating a feature map based on spatial feature points includes:
  • Step 402 Based on the respective feature descriptors of each image feature point in the updated feature point set, determine the average descriptor corresponding to the updated feature point set.
  • the server can refer to the following formula (2) to calculate the average descriptor corresponding to the feature point set:
  • u j is the average descriptor
  • j represents the jth feature point set (updated feature point set)
  • f is the descriptor of the image feature point in the jth feature point set
  • R D represents the D-dimensional real number space.
  • Step 404 From the feature descriptors of each image feature point in the updated feature point set, select a feature descriptor whose similarity to the average descriptor satisfies the similarity condition, and use the selected feature descriptor as a reference. Descriptor.
  • the similarity condition may be one of the following: the similarity is greater than a preset similarity threshold or the sorting is before the sorting threshold when arranged in descending order of similarity.
  • the similarity condition includes that the similarity is greater than the preset similarity threshold, then for each updated feature point set, after calculating the average descriptor corresponding to the feature point set, the server calculates the average descriptor respectively. The similarity between the feature descriptor of each image feature point in the feature point set and the average descriptor is then compared with the similarity threshold respectively. If the similarity corresponding to only one image feature point is greater than the preset similarity degree threshold, the feature descriptor of the image feature point is determined as the reference descriptor. If the similarity corresponding to multiple image feature points is greater than the preset similarity threshold, the feature descriptor corresponding to these image feature points can be Select one of them as the reference descriptor. For example, you can select the feature descriptor with the greatest similarity as the reference descriptor.
  • the distance condition includes sorting before the sorting threshold when sorting in descending order of similarity, Then for each updated feature point set, after the server calculates the similarity between the feature descriptors of each image feature point in the feature point set and the average descriptor, it can compare the similarity of each image feature point according to the similarity.
  • the feature descriptors are arranged in descending order, and the reference descriptor is selected from the feature descriptors sorted before the sorting threshold. For example, if the sorting threshold is 2, the feature descriptor sorted first can be selected as the reference descriptor.
  • the server can calculate the reference descriptor with reference to the following formula (3):
  • f j is the reference descriptor
  • j represents the jth feature point set (updated feature point set)
  • u j is the average descriptor
  • f represents the feature descriptor of the image feature point in the jth feature point set
  • Step 406 Project the spatial feature points onto the image to which each image feature point in the updated feature point set belongs, obtain multiple projected feature points, and determine the feature descriptor corresponding to the projected feature point based on the position of the projected feature point on the associated image. .
  • Step 408 Based on the difference between the feature descriptor corresponding to the projected feature point and the reference descriptor, determine the reprojection error corresponding to the projected feature point.
  • Step 410 Count the corresponding reprojection errors of each projection feature point to obtain the target error. Iteratively update the spatial feature points based on the target error. When the iteration stop condition is met, obtain the target space feature points corresponding to the updated feature point set. Based on the target Spatial feature points generate feature maps.
  • the server can project the spatial feature point onto the image to which each image feature point in the feature point set belongs, to obtain the spatial feature
  • Multiple projection feature points corresponding to the point further based on the position of each projection feature point on the corresponding image, the corresponding feature descriptor of each projection feature point can be determined, and then each projection feature point and the update calculated in step 404 are calculated.
  • the difference between the reference descriptors corresponding to the updated feature point set is used to obtain the reprojection error corresponding to each projected feature point.
  • each reprojection error is calculated to obtain the target error corresponding to the updated feature point set, and the target error corresponding to the updated feature point set is minimized.
  • the direction of the target error iteratively updates the spatial feature points corresponding to the updated feature point set, that is, the updated spatial feature points are regarded as the current spatial feature points, and step 406 is entered again, and steps 406 to 410 are repeated until the iteration is satisfied.
  • the stop condition is met, the obtained spatial feature points are the target space feature points, and then a feature map can be generated based on the target space feature points.
  • the iteration stop condition may be one of the following: the target error reaches a minimum value, the number of iterations reaches a preset number, or the iteration duration reaches a preset duration.
  • the target error can be calculated by referring to the following formula (4):
  • j is the j-th feature point set (updated feature point set)
  • Z(j) represents the set of images to which each image feature point in the j-th feature point set belongs
  • i represents the i-th frame image
  • C i represents the i-th frame picture
  • P j refers to the spatial feature point corresponding to the j-th feature point set
  • R i is the rotation matrix corresponding to the i-th frame image
  • t i is the translation matrix corresponding to the i-th frame image
  • f j is The reference descriptor corresponding to the jth feature point set.
  • the spatial feature points are projected onto the image to which each image feature point in the updated feature point set belongs, and multiple projected feature points are obtained, based on the position of the projected feature point on the image to which it belongs.
  • Statistics of the reprojection error corresponding to each projected feature point are obtained.
  • Target error iteratively updates the spatial feature points based on the target error. When the iteration stop condition is met, the target space feature points are obtained, realizing the position optimization of the spatial feature points.
  • the feature map generated based on the optimized target space feature points is used When positioning, the positioning accuracy can be further improved.
  • the multi-frame images are captured by a camera installed on the target sports equipment; the above-mentioned feature map generation method further includes: obtaining the inertial measurement data and speed measurement data of the target sports equipment when capturing the multi-frame images, using Inertial measurement data and speed measurement data are used to calculate the initial pose of the motion equipment to be positioned; pre-integration information is determined based on inertial measurement data, a factor graph is constructed based on the pre-integration information and speed measurement data, and the initial pose is adjusted based on the factor graph.
  • extracting image feature points from each frame of image, and determining corresponding feature descriptors based on the extracted image feature points at the positions of the corresponding images includes: inputting the image into a trained feature extraction model, and using the features
  • the extraction model outputs the first tensor corresponding to the image feature points and the second tensor corresponding to the feature descriptor; the first tensor is used to describe the possibility of feature points appearing in each area of the image; based on the first tensor, the image is modified Maximum suppression processing to determine the image feature points of the image from the image; convert the second tensor into a third tensor consistent with the image size, and match the third tensor with the position of the image feature point in the image to which it belongs.
  • the vector of is determined as the descriptor corresponding to the image feature point.
  • the server inputs the image into the trained feature extraction model, and outputs the first tensor corresponding to the feature point of the image and the second tensor corresponding to the feature descriptor through the feature extraction model.
  • the first tensor and the second tensor are both A multi-channel tensor, and the size of each channel is smaller than the original input image.
  • the value of each position in the first tensor is used to describe the possibility of feature points appearing in each corresponding area in the original input image, that is probability value.
  • the first output tensor can be H/N1 x W/N1 x X1
  • the second tensor can be H/N2 x W/N2 x X2, where N1, N2, X1 and X2 are all positive integers greater than 1.
  • the server may first convert the first tensor into a probability map with the same size as the input image, and search for the local maximum in the probability map value, determine the location of the local maximum as the target location. Since the size of the probability map and the input image are consistent, the pixels at the same location as the target location in the input image can be directly determined as the image of the input image. Feature points.
  • the server can implement it through the following steps:
  • the server can search for the maximum value along the direction of the N channels, and search for the maximum value at each pixel position.
  • the maximum value of is used as the value at the corresponding position in the third tensor, so that the third tensor can be obtained.
  • the channel index of the maximum value searched at each pixel position is used as the value at the corresponding position in the third tensor, so that the fourth tensor can be obtained.
  • the neighborhood where the target value is located includes multiple target positions.
  • the corresponding position of the target position in the image is related to the target value.
  • the image distance between the corresponding positions in the image is less than the preset distance threshold.
  • the server can sort each value in the third tensor from small to large to obtain a value set, and then traverse each value in the value set in sequence. For the traversed value, determine whether it is less than the preset threshold. If it is less than the preset threshold, , then continue to traverse the next value. If it is greater than the preset threshold, the traversed value is determined as the target value, and the neighborhood where the target value is located in the third tensor is searched. Since the size of the third tensor is reduced relative to the original input image, and the image feature points refer to the pixels in the input image, the neighborhood where the target value is located needs to be based on the pixel position of the target value in the third tensor.
  • the corresponding position in the original input image is determined. That is to say, if the neighborhood of the position of the target value includes multiple target positions, then the corresponding position of each target position in the input image is the same as the position of the target value.
  • the image distance between corresponding positions in the image is less than the preset distance threshold, that is, the corresponding position of each target position in the input image falls within the neighborhood range of the corresponding position in the image where the target value is located.
  • the location of the target value is point A
  • the corresponding position of point A in the input image is point B. If the dotted box in Figure 5 represents the neighborhood of point B, then point A is at the The corresponding position of each target position in the neighborhood in the three tensors in the input image falls within this dotted box.
  • the corresponding position of the pixel position in the third tensor in the original image is related to the channel where the pixel position is located.
  • the pixel position (i, j) of determine the index value from the corresponding position in the fourth tensor to be D[i, j], then its corresponding position in the original image is (N x i+D[i, j]/8 ,Nx j+D[i,j]%8), where N is the reduction ratio of the third tensor relative to the original input image.
  • the original input image is 640x480
  • the first tensor is 80x60x64
  • the second tensor is 80x60x256
  • the third tensor is 80x60 (each value represents the maximum value of the first tensor in 64 dimensions, decimal type)
  • D It is 80x60 (each value represents the index corresponding to the maximum value of the 64-dimensional dimension of the first tensor, integer type).
  • the 64-dimension of the first tensor corresponds to each 8x8 area of the original image, then a coordinate of the first tensor is (32, 53, 35)
  • the distance between the corresponding positions of the two pixel positions in the fourth tensor can be calculated as the distance between the corresponding positions of the two pixel positions in the original input image. For example, for a certain pixel in the third tensor Set (i, j) and another pixel position (i+n, j+n). The distance between the two corresponding pixel positions on the original image can be calculated by calculating the pixel position (i, j) and the fourth tensor. The distance between pixel positions (i+n, j+n) is obtained.
  • the target pixel point corresponding to the position of the target value in the image is determined as the image feature point of the image.
  • the target pixel point is determined from the image based on the location of the target value and the corresponding channel index value, and the channel index value is determined from the fourth tensor based on the location of the target value. For example, assuming that the pixel position coordinates of a certain target value in the third tensor are (i, j), then the corresponding position of the pixel position in the fourth tensor is also (i, j).
  • the coordinates in the original input image are (N x i+D[i,j]/8, The pixel point of Nx j+D[i,j]%8) is determined as the target pixel point corresponding to the position of the target value, where N is the reduction ratio of the third tensor relative to the original input image.
  • the output of the fifteenth convolution block of the feature extraction model is the tensor A of the feature point with the dimension H/8xW/8x64.
  • the output on the right is the tensor B of the descriptor with the dimension H/8xW. /8x256.
  • C[i, j] is less than a certain threshold (for example, it can be 0.05), skip it.
  • step 5 If step 5 is completed and C[i,j] is greater than any C[i+n,j+n] (or C[i-n,j-n]), then C[i,j] and (ix8+ D[i,j]/8,jx8+D[i,j]%8) is placed in the target set F.
  • the corresponding descriptor in the tensor G according to the result of the target set F, that is, for the subscript of each image feature point in the target set F, find the same position as the subscript in the tensor G, and then The vector composed of the values of each channel at this position is used as the feature descriptor of the image feature point, and the feature descriptor is a 256-dimensional vector. For example, for the subscript (10,13) of an image feature point in the target set F, find the position with (10,13) from the tensor G, and determine the vector composed of the values of each channel at the position. is the feature descriptor of the image feature points.
  • obtaining multiple frames of images captured for the target scene includes: obtaining multiple frames of original images of the target scene captured by a fisheye camera, performing distortion correction on the multiple frames of original images to obtain the images captured for the target scene. of multi-frame images.
  • the multi-frame images of the target scene obtained by the server are taken by a fisheye camera, and the fisheye camera imaging model is approximately a unit sphere projection model.
  • the fisheye camera imaging process is decomposed into two steps: first, linearly projecting the three-dimensional space points onto the virtual unit sphere; and then projecting the points on the unit sphere onto the image plane. This process is nonlinear.
  • the design of the fisheye camera introduces distortion, so the image produced by the fisheye camera has distortion, among which the radial distortion is very serious, so the distortion model mainly considers the radial distortion.
  • the projection function of the fisheye camera is designed to project a huge scene onto a limited image plane as much as possible.
  • the design models of fisheye cameras can be roughly divided into four types: equidistant projection model, equal solid angle projection model, orthogonal projection model and stereoscopic projection model.
  • any one of these four models can be used to perform distortion correction on multiple frames of original images captured by a fisheye camera to obtain multiple frames of images captured for the target scene.
  • the fisheye camera since the multi-frame images are captured by a fisheye camera, the fisheye camera has a larger viewing angle than a pinhole camera, can perceive more environmental information, and extract more image feature points, thereby further improving The robustness of the generated feature map improves the positioning accuracy.
  • a schematic flow chart for determining positioning information using the feature map generated by the embodiment of the present application includes the following steps:
  • Step 702 Obtain the inertial measurement data and speed measurement data of the sports equipment to be positioned and the target image captured by the sports equipment in the target scene, and use the inertial measurement data and speed measurement data to determine the initial pose of the sports equipment to be positioned.
  • the inertial measurement data can be data measured by (Inertial Measurement Unit, IMU).
  • the speed measurement data may be data measured by a speed sensor.
  • the speed measurement data may be data measured by a wheel speed meter.
  • the inertial measurement data and speed measurement data here are data measured when the moving equipment to be positioned moves in the target scene.
  • the server can receive inertial measurement data, speed measurement data sent by the sports equipment to be positioned, and target images captured by the sports equipment to be positioned in the target scene, and use the inertial measurement data and speed based on the preset kinematic model.
  • the measurement data is used to calculate the initial pose of the sports equipment to be positioned.
  • the preset kinematic model can reflect the relationship between vehicle position, speed, acceleration, etc. and time. This embodiment does not limit the specific form of the model. In practical applications, it can be reasonably set according to needs. For example, it can be Make improvements on some bicycle models to get the desired model.
  • Step 704 Determine the matching spatial feature points from the generated feature map based on the initial pose, and obtain the target spatial feature points.
  • the server can find the matching spatial feature point from the feature map based on the position represented by the initial pose, as the target spatial feature point.
  • the feature map also includes storing the pose corresponding to each spatial feature point.
  • the pose corresponding to the spatial feature point may be the pose of the sports device when shooting multiple frames of images during the process of generating the feature map.
  • the server can compare the initial pose of the sports equipment to be positioned with the poses corresponding to each spatial feature point, and determine the spatial feature point corresponding to the pose with the highest matching degree as the target. Feature points.
  • Step 706 Determine the image feature points that match the target space feature points from the target image, form a matching pair between the determined image feature points and the target space feature points, and determine the positioning information of the sports equipment based on the matching pair.
  • the server can compare the descriptors corresponding to the target space feature points with the feature descriptors corresponding to each image feature point on the target image, and determine the image feature point corresponding to the feature descriptor with the highest similarity to be the same as the target space
  • the image feature points of the feature point matching form a matching pair with the determined image feature points and the target space feature points, and then the positioning information of the sports equipment can be determined based on the matching pair.
  • the descriptor corresponding to the target space feature point may be the average of the feature descriptors of each image feature point in the feature point set corresponding to the target space feature point.
  • the PnP algorithm may be used to determine the positioning information based on the matching pair.
  • the PnP algorithm is an existing method and will not be elaborated here.
  • determining the positioning information based on the matching pair specifically includes: projecting the spatial feature points in the matching pair onto the target image to obtain the projected feature points, and calculating the weight by projecting the feature points and the image feature points in the matching pair.
  • Projection error the corresponding pose when the least squares function of the reprojection error is minimized,
  • the corrected pose is determined, and the initial pose is corrected through the corrected pose to obtain positioning information.
  • the server may return positioning information to the sports device to be positioned.
  • the generated feature map is more robust, so that the feature map can be used to locate the image during the positioning process. Accuracy has been greatly improved.
  • the feature map generation method of this application can be applied to parking application scenarios, specifically including the following steps:
  • the target vehicle equipped with a fisheye camera can drive in the garage, and the environment in the garage is photographed through the fisheye camera, and multiple frames of original images are obtained and sent to the server.
  • the server performs distortion correction on the multiple frames of original images. Get multiple frames of images taken for the target scene.
  • target vehicle here and the vehicle to be parked may be the same vehicle or different vehicles.
  • the server can input the image into the trained feature extraction model, and output the first tensor corresponding to the image feature points and the second tensor corresponding to the feature descriptor through the feature extraction model; the first The tensor is used to describe the possibility of feature points appearing in each area of the image; perform non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; convert the second tensor into The third tensor with the same image size determines the vector in the third tensor that matches the position of the image feature point in the corresponding image as the descriptor corresponding to the image feature point.
  • the first tensor includes multiple channels, and non-maximum suppression processing is performed on the image based on the first tensor to determine the image feature points of the image from the image, including: acquiring the first tensor along the direction of the multiple channels. The maximum value at each position and the channel index corresponding to each maximum value are obtained respectively to obtain the third tensor and the fourth tensor; determine the target value from the third tensor, and search the neighborhood of the location of the target value in the third tensor , the neighborhood of the position of the target value includes multiple target positions.
  • the image distance between the corresponding position of the target position in the image and the corresponding position of the target value in the image is less than the preset distance threshold; the search result indicates the target
  • the target pixel point in the image corresponding to the position of the target value is determined as the image feature point of the image; the target pixel point is based on the position of the target value and the corresponding channel index
  • the value is determined from the image, and the channel index value is determined from the fourth tensor based on the location of the target value.
  • the filter conditions include at least one of the following: the distance between the initial spatial feature points calculated based on the feature point set and the shooting device of the multi-frame image is greater than the first preset distance threshold; the initial spatial feature calculated based on the feature point set The distance between the point and the shooting device of the multi-frame image is less than the second preset distance threshold, and the second preset distance threshold is less than the first preset distance threshold; the disparity calculated based on the feature point set is greater than the preset disparity threshold; based on the features The average reprojection error calculated by the point set is greater than the preset error threshold.
  • the server determines representative feature points from the feature point set through the following steps: based on the position of each image feature point in the feature point set in the corresponding image, calculates the average feature point position corresponding to the feature point set; from the feature point set Determine the image feature points whose distance from the average feature point position satisfies the distance condition, and use the determined image feature point as the representative feature point; where the distance condition includes that the distance from the average feature point position is less than or a distance threshold, or according to One of the following: Sort before the sorting threshold when sorted in ascending order by distance from the average feature point position.
  • calculating the matching confidence between each target feature point and the representative feature point respectively includes: obtaining the respective feature descriptors of each target feature point and obtaining the feature descriptor of the representative feature point; separately calculating the respective feature descriptors of each target feature point.
  • the vector similarity between the feature descriptor and the feature descriptor representing the feature point, and each vector similarity is regarded as the matching confidence between the corresponding target feature point and the representative feature point.
  • the server can use a gradient descent algorithm to update the positions of the remaining image feature points in the feature point set in the direction of minimizing the position error, determine the descriptors corresponding to the obtained image feature points from the third tensor, and then re- The position error is calculated and the process is repeated until the iteration stop condition is met.
  • the server can again determine whether there is a feature point set that meets the filtering conditions in these feature point sets, filter the feature point sets that meet the filtering conditions, and filter the remaining feature point sets after filtering. feature point set, continue to perform subsequent steps.
  • filtering conditions please refer to the description in the above embodiment.
  • the target feature point is the space after position optimization. Feature points.
  • a vehicle that needs to park drives into the garage entrance, it can download the feature map from the server, and the user can input the target parking space location where it wants to park, so that the vehicle can plan the route from the garage entrance to the target parking space location for the user based on the feature map. Parking route.
  • the vehicle drives automatically according to the planned parking route.
  • the vehicle is positioned in the following ways:
  • the feature map generation method of the present application can be applied to an application scenario of automatic cleaning by a sweeping robot.
  • the sweeping robot first walks in the area that needs to be cleaned, and collects the data in the area.
  • a feature map is generated according to the feature map generation method provided by the embodiment of the present application.
  • the cleaning route can be planned through the feature map.
  • automatic positioning is performed based on the feature map. , to perform cleaning tasks according to the planned cleaning route.
  • embodiments of the present application also provide a feature map generation device for implementing the above-mentioned feature map generation method.
  • the solution to the problem provided by this device is similar to the solution described in the above method. Therefore, the specific limitations in the embodiments of one or more feature map generation devices and positioning information determination devices provided below can be found in the above. The limitations of the feature map generation method will not be described again here.
  • a feature map generating device 800 including:
  • the feature extraction module 802 is used to obtain multiple frames of images captured for the target scene, extract image feature points from each frame of image, and determine the corresponding feature descriptor based on the extracted image feature points at the location of the corresponding image;
  • the feature point set determination module 804 is used to form a feature point set with image feature points that have a matching relationship among the image feature points of each frame image;
  • the difference calculation module 806 determines representative feature points from the feature point set, and calculates the difference between the feature descriptors corresponding to the remaining image feature points in the feature point set and the feature descriptors corresponding to the representative feature points;
  • the position update module 808 is used to determine the position error of the feature point set based on the calculated difference, and iteratively update the remaining image feature points in the feature point set based on the position error. When the iteration stop condition is met, the updated feature point set is obtained;
  • the feature map generation module 810 is used to determine the spatial feature points corresponding to the updated feature point set based on the position of each image feature point in the updated feature point set in the corresponding image, and generate a feature map based on the spatial feature points.
  • the feature map is used It is used to position the sports equipment to be positioned in the target scene.
  • the above-mentioned feature map generating device obtains multiple frames of images captured for the target scene, extracts image feature points from each frame of image, determines the corresponding feature descriptor based on the extracted image feature points at the position of the corresponding image, and converts each frame into
  • the image feature points with matching relationships among the image feature points of the image form a feature point set
  • the representative feature points are determined from the feature point set
  • the feature descriptors corresponding to the remaining image feature points in the feature point set and the representative feature points are calculated.
  • the difference between feature descriptors determines the position error of the feature point set based on the calculated difference. Iteratively updates the remaining image feature points in the feature point set based on the position error.
  • the updated feature point set is obtained. ; Based on the position of each image feature point in the updated feature point set in the corresponding image, determine the spatial feature points corresponding to the updated feature point set, and generate a feature map based on the spatial feature points. Since in the process of generating the feature map, based on The feature descriptor of the image feature point optimizes the position of the image feature point, which can make the generated feature map more robust, thus using the feature map to greatly improve the positioning accuracy during the positioning process.
  • the position update module 808 is used to use each remaining image feature point in the feature point set as a target feature point, and calculate the matching confidence between each target feature point and the representative feature point respectively; based on each target The matching confidence and difference corresponding to each feature point are calculated, and the position error corresponding to each target feature point is calculated; the position error corresponding to each target feature point is counted to obtain the position error of the feature point set.
  • the position update module 808 is also used to obtain the respective feature descriptors of each target feature point and obtain the feature descriptors representing the feature points; respectively calculate the respective feature descriptors and codes of each target feature point.
  • the vector similarity between the feature descriptors of the table feature points is represented, and each vector similarity is used as the matching confidence between the corresponding target feature point and the representative feature point.
  • the difference calculation module 806 is also used to calculate the average feature point position corresponding to the feature point set based on the position of each image feature point in the feature point set in the corresponding image; determine the average feature point position from the feature point set. For image feature points whose distances between feature point positions satisfy the distance condition, the determined image feature point will be regarded as the representative feature point; where the distance condition includes that the distance to the average feature point position is less than or equal to the distance threshold, or in accordance with the average feature point position.
  • the feature point set includes multiple; the difference calculation module 806 is also configured to filter the feature point set for each feature point set if the feature point set satisfies the filtering condition; in the feature point set If the filtering conditions are not met, enter the step of determining representative feature points from the feature point set; where the filtering conditions include at least one of the following: between the initial spatial feature points calculated based on the feature point set and the shooting equipment of the multi-frame image.
  • the distance between the initial spatial feature points calculated based on the feature point set and the shooting device of the multi-frame image is less than the second preset distance threshold, and the second preset distance threshold is less than the first
  • the preset distance threshold is greater than the preset disparity threshold calculated based on the feature point set; the average reprojection error calculated based on the feature point set is greater than the preset error threshold.
  • the feature map generation module is also used to: determine the average descriptor corresponding to the updated feature point set based on the respective feature descriptors of each image feature point in the updated feature point set; Among the feature descriptors of each image feature point in the feature point set, select a feature descriptor whose similarity to the average descriptor satisfies the similarity condition, and use the selected feature descriptor as a reference descriptor; use the spatial feature point Project to the image to which each image feature point in the updated feature point set belongs, and obtain multiple projected feature points.
  • the feature descriptor corresponding to the projected feature point is determined; based on the corresponding projected feature point The difference between the feature descriptor and the reference descriptor determines the reprojection error corresponding to the projected feature point; the target error is obtained by counting the reprojection error corresponding to each projected feature point, and iteratively updates the spatial feature point based on the target error. The iteration stops when the When conditions are met, the target space feature points corresponding to the updated feature point set are obtained, and a feature map is generated based on the target space feature points.
  • the feature extraction module is also used to input the image into a trained feature extraction model, and output the first tensor corresponding to the image feature points and the second tensor corresponding to the feature descriptor through the feature extraction model;
  • One tensor is used to describe the possibility of feature points appearing in each area of the image; perform non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; convert the second tensor to The third tensor is consistent with the size of the image, and the vector in the third tensor that matches the position of the image feature point in the corresponding image is determined as the descriptor corresponding to the image feature point.
  • the first tensor includes multiple channels
  • the feature extraction module is also configured to: obtain the maximum value of the first tensor at each position along the direction of the multiple channels and the channel index corresponding to each maximum value, Obtain the third tensor and the fourth tensor respectively; determine the target value from the third tensor, and conduct the neighborhood of the target value in the third tensor.
  • the neighborhood of the position of the target value includes multiple target positions, the image distance between the corresponding position of the target position in the image and the corresponding position of the target value in the image is less than the preset distance threshold; in the search results
  • the target pixel point in the image corresponding to the location of the target value is determined as the image feature point of the image; the target pixel point is based on the location of the target value and the corresponding
  • the channel index value is determined from the image, and the channel index value is determined from the fourth tensor based on the location of the target value.
  • the feature extraction module is also used to: obtain multiple frames of original images of the target scene captured by the fisheye camera, perform distortion correction on the multiple frames of original images, and obtain multiple frames of images captured of the target scene.
  • the above device further includes: a positioning information determination module for initially acquiring inertial measurement data, speed measurement data of the moving equipment to be positioned, and target images captured by the moving equipment in the target scene, using the inertial measurement data and speed measurement data to determine the initial pose of the sports equipment to be positioned; based on the initial pose, determine the matching spatial feature points from the generated feature map to obtain the target space feature points; determine the target space feature points from the target image Matching image feature points, the determined image feature points and the target space feature points form a matching pair, and the positioning information of the sports equipment is determined based on the matching pair.
  • a positioning information determination module for initially acquiring inertial measurement data, speed measurement data of the moving equipment to be positioned, and target images captured by the moving equipment in the target scene, using the inertial measurement data and speed measurement data to determine the initial pose of the sports equipment to be positioned; based on the initial pose, determine the matching spatial feature points from the generated feature map to obtain the target space feature points; determine the target space feature points from the target image Matching
  • Each module in the above feature map generating device can be implemented in whole or in part by software, hardware and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in Figure 9.
  • the computer device includes a processor, a memory, an input/output interface (Input/Output, referred to as I/O), and a communication interface.
  • the processor, memory and input/output interface are connected through the system bus, and the communication interface is connected to the system bus through the input/output interface.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the computer device's database is used to store feature map data.
  • the input/output interface of the computer device is used to exchange information between the processor and external devices.
  • the communication interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by the processor, implements a feature map generation method.
  • a computer device may be a terminal installed in the above-mentioned sports equipment, for example, it may be a vehicle-mounted terminal, and its internal structure diagram may be shown in FIG. 10 .
  • the computer device includes a processor, memory, input/output interface, communication interface, display unit and input device. Among them, the processor, memory and input/output interface are connected through the system bus, and the communication interface, display unit and input device are connected to the system bus through the input/output interface. Wherein, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems and computer programs.
  • This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the input/output interface of the computer device is used to exchange information between the processor and external devices.
  • the communication interface of this computer equipment is used for It is used for wired or wireless communication with external terminals.
  • the wireless mode can be implemented through WIFI, mobile cellular network, NFC (Near Field Communication) or other technologies.
  • the computer program when executed by the processor, implements a feature map generation method.
  • the display unit of the computer device is used to form a visually visible picture and can be a display screen, a projection device or a virtual reality imaging device.
  • the display screen can be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer device can be a display screen.
  • the touch layer covered above can also be buttons, trackballs or touch pads provided on the computer equipment shell, or it can also be an external keyboard, touch pad or mouse, etc.
  • FIGS 9 and 10 are only block diagrams of partial structures related to the solution of the present application, and do not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • Computer equipment may include more or fewer components than shown in the figures, or some combinations of components, or have different arrangements of components.
  • a computer device including a memory and a processor.
  • Computer-readable instructions are stored in the memory.
  • the processor executes the computer-readable instructions, the steps of the above feature map generation method are implemented.
  • a computer-readable storage medium on which computer-readable instructions are stored.
  • the steps of the above feature map generation method are implemented.
  • a computer program product including computer readable instructions, which when executed by a processor implement the steps of the above feature map generating method.
  • the user information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the computer readable instructions can be stored in a non-volatile computer.
  • the computer-readable instructions when executed, may include the processes of the above method embodiments.
  • Any reference to memory, database or other media used in the embodiments provided in this application may include at least one of non-volatile and volatile memory.
  • Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive memory (ReRAM), magnetic variable memory (Magnetoresistive Random Access Memory (MRAM), ferroelectric memory (Ferroelectric Random Access Memory (FRAM)), phase change memory (Phase Change Memory, PCM), graphene memory, etc.
  • Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory.
  • RAM Random Access Memory
  • RAM Random Access Memory
  • RAM random access memory
  • RAM Random Access Memory
  • RAM random access memory
  • RAM Random Access Memory
  • RAM random access memory
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM Dynamic Random Access Memory
  • the databases involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database.
  • Non-relational databases may include blockchain-based distributed databases, etc., but are not limited thereto.
  • the processor involved in each embodiment provided in this application may be General processors, central processing units, graphics processors, digital signal processors, programmable logic devices, data processing logic devices based on quantum computing, etc., are not limited to these.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种特征地图生成方法、装置、存储介质和计算机设备,可应用于地图领域或者自动驾驶领域,包括:获得多帧图像,从各帧图像上分别提取图像特征点,基于提取的图像特征点确定对应的特征描述子(202);将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合(204);从特征点集合中确定代表特征点,计算特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异(206);基于计算得到的差异确定位置误差,基于位置误差迭代更新特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合(208);基于更新后的特征点集合确定空间特征点,以生成用于定位的特征地图(210)。采用本方法能够提高定位精度。

Description

特征地图生成方法、装置、存储介质和计算机设备
相关申请
本申请要求2022年08月08日申请的,申请号为2022109459385,名称为“特征地图生成方法、装置、存储介质和计算机设备”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本申请涉及计算机技术领域,特别是涉及一种特征地图生成方法、装置、计算机设备、存储介质和计算机程序产品。
背景技术
随着计算机技术的发展,出现了视觉定位技术,在视觉定位技术中,可以通过构建特征地图,特征地图是一种数据结构,可以用有关的几何特征(如点、直线、面)表示观测环境,进而辅助待定位的运动设备进行定位。例如,在自动驾驶中,可以通过构建特征地图对自动驾驶车辆进行定位。
随着自动驾驶等应用越来越广泛,对定位的精度要求越来越高,然而相关技术中构建的特征地图在使用过程中经常存在定位精度低的问题。
发明内容
根据本申请提供的各种实施例,提供一种特征地图生成方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。
一方面,本申请提供了一种特征地图生成方法,由计算机执行,包括:获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的所述图像特征点在所属图像的位置确定对应的特征描述子;将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合;从所述特征点集合中确定代表特征点,计算所述特征点集合中剩余的图像特征点对应的特征描述子与所述代表特征点对应的特征描述子之间的差异;基于计算得到的差异确定所述特征点集合的位置误差,基于所述位置误差迭代更新所述特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;及基于所述更新后的特征点集合中各个图像特征点在所属图像的位置,确定所述更新后的特征点集合对应的空间特征点,基于所述空间特征点生成特征地图,所述特征地图用于对待定位的运动设备在所述目标场景中进行定位。
另一方面,本申请还提供了一种特征地图生成装置。所述装置包括:特征提取模块,用于获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提 取的所述图像特征点在所属图像的位置确定对应的特征描述子;特征点集合确定模块,用于将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合;差异计算模块,从所述特征点集合中确定代表特征点,计算所述特征点集合中剩余的图像特征点对应的特征描述子与所述代表特征点对应的特征描述子之间的差异;位置更新模块,用于基于计算得到的差异确定所述特征点集合的位置误差,基于所述位置误差迭代更新所述特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;及特征地图生成模块,用于基于所述更新后的特征点集合中各个图像特征点在所属图像的位置,确定所述更新后的特征点集合对应的空间特征点,基于所述空间特征点生成特征地图,所述特征地图用于对待定位的运动设备在所述目标场景中进行定位。
另一方面,本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机程序时实现上述特征地图生成方法的步骤。
另一方面,本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质,其上存储有计算机可读指令,所述计算机程序被处理器执行时实现上述特征地图生成方法的步骤。
另一方面,本申请还提供了一种计算机程序产品。所述计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述特征地图生成方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或传统技术中的技术方案,下面将对实施例或传统技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据公开的附图获得其他的附图。
图1为一个实施例中特征地图生成方法的应用环境图;
图2为一个实施例中特征地图生成方法的流程示意图;
图3为一个实施例中特征点集合的组成示意图;
图4为一个实施例中基于空间特征点生成特征地图的流程示意图;
图5为一个实施例中在输入图像中确定对应位置的示意图;
图6为一个实施例中特征提取模型的结构示意图;
图7为一个实施例中定位信息确定步骤的流程示意图;
图8为一个实施例中特征地图生成装置的结构框图;
图9为一个实施例中计算机设备的内部结构图;
图10为另一个实施例中计算机设备的内部结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的特征地图生成方法可应用于智能交通***(Intelligent Traffic System,ITS),以及智能车路协同***(Intelligent Vehicle Infrastructure Cooperative Systems,IVICS)。
本申请实施例提供的特征地图生成方法,可以应用于如图1所示的应用环境中。其中,运动设备102通过网络与服务器104进行通信。其中,运动设备102指的是可以自主运动的设备或者被动运动的设备中的其中一种,自主运动的设备可以是各种交通工具、机器人等等,被动运动的设备例如可以是由用户携带并跟随用户的移动而移动的终端,被动运动的设备例如可以是智能手机、平板电脑和便携式可穿戴设备。运动设备102上安装有拍摄设备。服务器104可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云计算服务的云服务器。具体地,在特征地图生成阶段,任意一个运动设备上的拍摄设备可以对目标场景进行拍摄得到多帧图像并将拍摄得到的多帧图像发送至服务器,服务器基于各帧图像生成特征地图并保存,在定位信息确定阶段,待定位的运动设备可以将惯性测量数据、速度测量数据以及在该目标场景中拍摄得到的目标图像发送至服务器,服务器可以基于这些数据以及保存的特征地图,确定待定位的运动设备的定位信息,将定位信息发送至待定位的运动设备。
可以理解的是,在其他实施例中,任意一个运动设备在目标场景中运动时,该运动设备上的拍摄设备可以对目标场景进行拍摄得到多帧图像,然后基于各帧图像生成特征地图并保存,从而当该运动设备再次在该目标场景中运动时,可以基于保存的特征地图确定定位信息。同时,该运动设备生成的特征地图还可以发送至服务器,其他待定位的运动设备在该目标场景中运动时,可以下载该特征地图,基于下载的特征地图确定定位信息,或者其他待定位的运动设备在该运动场景中运动时,可以将惯性测量数据、速度测量数据以及在该目标场景中拍摄得到的目标图像发送至服务器,服务器可以基于这些数据以及保存的特征地图,确定待定位的运动设备的定位信息,将定位信息返回至该待定位的运动设备。
在一个实施例中,如图2所示,提供了一种特征地图生成方法,该方法由计算机设备执行,具体可以由图1中的运动设备或服务器等计算机设备单独执行,也可以由运动设备和服务器协同执行,在本申请实施例中,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤202,获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子。
其中,目标场景指的是所要生成的特征地图所针对的具体的场景,目标场景具体可以是车辆所处的环境,例如,目标场景可以是车辆的可能行驶路线所确定的场景,车辆在该场景中行驶的过程中通过摄像头采集获得周边的多帧图像。图像特征点是图像上的某些可用于描述场景的特征的像素点,例如显著的边缘点、方向梯度直方图特征、Haar特征等。特征描述子与图像特征点是一一对应的,特征描述子是对特征点附近邻域内高斯图像梯度统计结果的一种表示,特征描述子可以用于描述对应的图像特征点。
具体地,运动设备可以采集多帧图像,实时传输给服务器进行处理,或者运动设备可以只负责将采集到的多帧图像存储起来,在图像采集完成后,将存储的多帧图像以某种方式输入至服务器进行处理。服务器在获得针对目标场景拍摄得到的多帧图像后,可以从各帧图像上分别提取图像特征点,针对每个图像特征点,运动设备可以基所针对的图像特征点在所属图像的位置确定该图像特征点对应的特征描述子,从而可以得到各帧图像各自的图像特征点,以及各个图像特征点各自的特征描述子。
在一个实施例中,对图像特征点的提取可以采用,但不限于Good Features to Track等算法实现,在计算机视觉库OpenCV里提供了相应的函数。在其他实施例中,还可以通过训练机器学习模型对图像进行特征点提取,机器学习模型包括多个卷积层,各个卷积层对原始图像进行不同的处理后,输出特征图像,该特征图像上表征了原始图像中各个位置为特征点的可能性,进而基于特征图像可以确定原始的特征点。可以理解的是,每帧图像上都可以提取出多个图像特征点。多个指的是至少两个。
步骤204,将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合。
其中,具有匹配关系的图像特征点指的是相似的图像特征点。在一个实施例中,具有匹配关系的图像特征点可以通过其特征描述子进行判断,在两个图像特征点的特征描述子达到一定的近似程度时认为二者是匹配的。
具体地,服务器可以将各帧图像中的各图像特征点,按照相互之间的匹配关系进行集合划分,从而可以得到多个特征点集合,这些特征点集合中,属于同一特征点集合的各图像特征点相互之间具有匹配关系。举例说明,如图3所示,假设总共有3帧图像,第一帧图像包括图像特征点A1、A2和A3,第二帧图像包括图像特征点B1、B2、B3和B4,第三帧图像包括图像特征点C1、C2和C3,假设其中A1、B1和C1为相互之间具有匹配关系的图像特征点,A2、B2和C2为相互之间具有匹配关系的图像特征点,A3、B3和C3为相互之间具有匹配关系的图像特征点,则A1、B1和C1可以组成特征点集合1,A2、B2和C2可以组成特征点集合2,A3、B3和C3可以组成特征点集合。
在一个实施例中,假设多帧图像共有M帧,取i为1至M,首先提取出第i帧图像上的N个图像特征点,提取图像特征点的方法可以参考前文。若i=1,即对于第一帧图像, 可以对其中的每个图像特征点均创建一个对应的特征点集合。若i>1,取j为1至N,判断第i-1帧图像中是否存在与第i帧图像中的第j个图像特征点匹配的图像特征点,若存在与之匹配的图像特征点,则将第j个图像特征点加入到与之匹配的图像特征点对应的特征点集合中(由于第i-1帧已经处理完毕,该特征点集合必然已经存在);若不存在与之匹配的图像特征点,创建与第j个图像特征点对应的特征点集合。一旦某个特征点集合不再有新的图像特征点加入,即可以认为该特征点集合构建完成。采用上述方法逐帧处理图像,在第M帧图像处理完成后,得到多个特征点集合,每个特征点集合中包括至少一个图像特征点,或者包括一个相互之间具有匹配关系的图像特征点序列。可以理解的是,在实际应用中,如果是实时构建特征地图,M未必是已知的,但具体的步骤和上面是类似的,只需要一直将i进行递增直至处理完所有图像即可。
步骤206,从特征点集合中确定代表特征点,计算特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异。
其中,代表特征点集合指的是特征点集合中可以对特征点集合进行代表的图像特征点。特征点集合中剩余的图像特征点指的是特征点集合中除代表特征点之外的图像特征点。举例说明,假设某个特征点集合包括A1,B1,C1和D1总共四个图像特征点,其中A1为代表特征点,则B1,C1和D1为剩余的图像特征点。在一个实施例中,服务器可以从各个特征点集合中随机选择一个图像特征点,作为各个特征点集合各自的代表特征点。在其他实施例中,服务器可以计算各个特征点集合各自的平均特征点,将各个特征点集合中与各自的平均特征点相隔最近的图像特征点确定为代表特征点。
具体地,为防止对特征点集合中的图像特征点迭代更新的过程中发生整体偏移,本实施例中,可以在每个特征点集合中确定一个代表特征点,在迭代更新的过程中,保持该代表特征点的位置固定不动,从而计算各个特征点集合中剩余的图像特征点对应的特征描述子与各特征点集合各自的代表特征点对应的特征描述子之间的差异,得到各个剩余的图像特征点各自对应的差异。
在一个实施例中,对于每一个特征点集合,服务器可以计算该特征点集合中各个剩余的图像特征点对应的特征描述子与该特征点集合对应的特征描述子之间的绝对差值,得到各个剩余的图像特征点各自对应的差异。在其他实施例中,服务器在计算得到绝对差值后,可以计算绝对差值的平方,得到各个剩余的图像特征点各自对应的差异。
步骤208,基于计算得到的差异确定特征点集合的位置误差,基于位置误差迭代更新特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合。
其中,迭代停止条件例如可以是位置误差达到最小值,迭代次数达到预设次数,或者迭代时长达到最小时长中的其中一种。
具体地,由于在确定图像特征点时,每个特征点集合确定一个空间特征点,为提高所确定的空间特征点的精度,需要减少特征点集合的整体的位置误差,基于此,本实施例中, 对于每一个特征点集合,服务器可以统计该特征点集合中各个剩余的图像特征点对应的差异,基于统计得到的差异确定特征点集合的位置误差,朝着最小化该位置误差的方向迭代更新代表特征点之外的各个图像特征点的位置,每更新一次,相当于对图像特征点的位置进行了一次优化,基于优化得到的图像特征点对应的特征描述子重新计算位置误差,进行下一次更新,不断重复该步骤对图像特征点的位置进行多次优化,当满足迭代停止条件时,将属于同一个特征点集合中的各个更新后的图像特征点和代表特征点组成更新后的特征点集合。在更新的过程中,可以采用梯度下降算法对图像特征点的位置进行更新。
在一个实施例中,为了防止优化过程中发生退化,服务器可以针对特征点集合计算海森矩阵的奇异值,如果最大的奇异值除以最小的奇异值大于预设阈值,则放弃更新。
步骤210,基于更新后的特征点集合中各个图像特征点在所属图像的位置,确定更新后的特征点集合对应的空间特征点,基于空间特征点生成特征地图,特征地图用于对待定位的运动设备在目标场景中进行定位。
其中,空间特征点指的是三维特征点,即图像上的特征点在三维空间中的对应点。本实施例中的特征地图可以是包括多个空间特征点的一种数据结构,其具体形式不作限定。待定位的运动设备指的是需要进行定位的运动设备,待定位的运动设备与发送多帧图像的运动设备可以是相同的运动设备,也可以是不相同的运动设备。图像特征点所属图像的位姿是指拍摄该帧图像时摄像头的位姿,这一位姿可以基于同一时刻运动设备的位姿,以及摄像头与运动设备之间的相对位姿关系,经由姿态变换而获得。
具体地,对于每一个更新后的特征点集合,服务器可以基于更新后的特征点集合中各个图像特征点在所属图像的位置,以及所属图像的位姿进行三角化计算,得到各个特征点集合各自对应的空间特征点。进一步地,服务器可以基于各个空间特征点生成特征地图,将特征地图进行存储,进而在后续的定位过程中,可以利用该特征地图对待定位的运动设备进行辅助定位。其中,三角化(Triangulation)计算是一种将二维的图像特征点映射到三维的空间特征点的现有方法,此处不进行详细阐述。可以理解的是,空间特征点的描述子可以为生成该空间特征点的所有图像特征点的描述子的平均值。
在一个实施例中,服务器具体可以通过以下步骤确定图像特征点所属图像的位姿:首先,获得运动设备与摄像头之间的相对位姿,该相对位姿在运动设备的运动过程中通常是保持不变的,可以通过标定获得。然后,根据运动设备上传的惯性测量数据和速度测量数据确定运动设备在各个时刻的位姿。之后,将运动设备在各个时刻的位姿与多帧图像的采集时间进行对齐,这里的对齐指的是确定每帧图像对应的运动设备的位姿,该位姿对应的数据采集时刻(采集惯性测量数据和速度测量数据的时刻)与该帧图像的采集时刻相同(或者在误差允许范围内相同)。最后,基于与每帧图像对应的运动设备的位姿以及运动设备与摄像头之间的相对位姿进行姿态变换,即可获得该帧图像的位姿。
上述特征地图生成方法中,通过获得针对目标场景拍摄得到的多帧图像,从各帧图像 上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子,将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合,从特征点集合中确定代表特征点,计算特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异,基于计算得到的差异确定特征点集合的位置误差,基于位置误差迭代更新特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;基于更新后的特征点集合中各个图像特征点在所属图像的位置,确定更新后的特征点集合对应的空间特征点,基于空间特征点生成特征地图,由于在生成特征地图的过程中,基于图像特征点的特征描述子对图像特征点进行了位置优化,可以使得生成的特征地图更加鲁棒,从而利用该特征地图在定位的过程中定位精度得到了大大提升。
在一个实施例中,基于计算得到的差异确定特征点集合的位置误差,包括:将特征点集合中各个剩余的图像特征点分别作为目标特征点,分别计算各个目标特征点和代表特征点之间的匹配置信度;基于各个目标特征点各自对应的匹配置信度和差异,计算各个目标特征点各自对应的位置误差;统计各个目标特征点各自对应的位置误差,得到特征点集合的位置误差。
其中,目标特征点和代表特征点之间的匹配置信度是用于表征目标特征点和代表特征点之间的匹配程度的,匹配程度越高,代表两个特征点越相似。
具体地,对于每一个特征点集合,服务器可以将该特征点集合中各个剩余的图像特征点分别作为目标特征点,对于每一个目标特征点,服务器可以计算该目标特征点和代表特征点之间的匹配置信度,进而将匹配置信度和差异相乘,得到该目标特征点对应的位置误差,最后统计各个目标特征点各自对应的位置误差,得到该特征点集合的位置误差。其中,统计可以是加和、计算平均值或者计算中位值中的其中一种。
在一个具体的实施例中,服务器可以通过以下公式(1)计算得到特征点集合的位置误差,其中,j代表第j个特征点集合,为第j个特征点集合的位置误差,u和v代表图像特征点,i(u)代表i帧图像的第u个图像特征点,k(v)代表第k帧图像的第v个特征点,wuv为匹配置信度,pu代表图像特征点u在图像上的位置,pv代表图像特征点v在图像上的位置,Fi(u)[pu]代表pu的描述子,Fk(v)[pv]代表pv的描述子:
本实施例中,通过计算图像特征点的匹配置信度,基于匹配置信度和差异得到图像特征点的位置误差,可以使得各个图像特征点的位置误差更加准确,从而最后通过统计特征点集合中各个图像特征点的位置误差得到的特征点集合的位置误差更加准确,从而可以得到精度更高的特征地图,进一步提升定位精度。
在一个实施例中,分别计算各个目标特征点和代表特征点之间的匹配置信度,包括:分别获取各个目标特征点各自的特征描述子,并获取代表特征点的特征描述子;分别计算各个目标特征点各自的特征描述子和代表特征点的特征描述子之间的向量相似度,将各个向量相似度作为各自对应的目标特征点与代表特征点之间的匹配置信度。
其中,向量相似度是用于描述两个向量之间的相似程度的,特征描述子为向量形式,因此可以进行向量相似度的计算。在一个实施例中,向量相似度例如可以是余弦相似度。
具体地,服务器可以分别获取各个目标特征点各自的特征描述子,并获取代表特征点的特征描述子,然后分别计算各个目标特征点各自的特征描述子和代表特征点的特征描述子之间的向量相似度,将各个向量相似度作为各自对应的目标特征点的匹配置信度。举例说明,假设某个特征点集合中包括图像特征点A1、B1和C1,其中C1为代表特征点,则可以分别获取A1、B1和C1各自的特征描述子,计算图像特征点A1的特征描述子与代表特征点C1之间的向量相似度,作为图像特征点A1和代表特征点C1之间的匹配置信度,计算图像特征点B1的特征描述子与代表特征点C1之间的向量相似度,作为图像特征点B1和代表特征点C1之间的匹配置信度。
上述实施例中,通过计算特征描述子之间的向量相似度作为匹配置信度,由于特征描述子是对图像特征点进行描述的,因此得到的匹配置信度更加准确。
在一个实施例中,从特征点集合中确定代表特征点,包括:基于特征点集合中的各个图像特征点在所属图像中的位置,计算特征点集合对应的平均特征点位置;从特征点集合中确定与平均特征点位置之间的距离满足距离条件的图像特征点,将确定的图像特征点作为代表特征点。
其中,距离条件包括与平均特征点位置之间的距离小于或者等于距离阈值,或者按照与平均特征点位置之间的距离升序排列时排序在排序阈值之前中的其中一种。
具体地,对于每一个特征点集合,服务器可以获取该特征点集合中的各个图像特征点在所属图像中的位置,将同一维度的位置数值相加再求平均值得到该维度的目标数值,各个维度的目标数值即确定了该特征点集合对应的平均特征点位置。举例说明,假设某个特征点集合中包括图像特征点A1,B1和C1,其中A1在所属图像中的位置为(x1,y1),B1在所属图像中的位置为(x2,y2),C1在所属图像中的位置为(x3,y3),则该特征点集合对应的平均特征点位置为((x1+x2+x3)/3,(y1+y2+y3)/3)。
对于每一个特征点集合,服务器在计算得到该特征点集合对应的平均特征点位置后,可以计算该特征点集合中各个图像特征点的位置与该平均特征点位置之间的距离,根据计算得到的距离筛选满足距离条件的图像特征点,将筛选得到的图像特征点确定为代表特征点。
在一个具体的实施例中,距离条件包括与平均特征点位置之间的距离小于或者等于距离阈值,那么服务器在计算得到各个图像特征点与所属特征点集合对应的平均特征点位置 之间的距离后,分别与距离阈值进行比较,若只有一个图像特征点与所属特征点集合对应的平均特征点位置之间的距离小于距离阈值,则将该图像特征点确定为代表特征点,若是有多个图像特征点与所属特征点集合对应的平均特征点位置之间的距离小于距离阈值,则可以在这些图像特征点中选择一个作为代表特征点,例如,可以选择距离最小的图像特征点作为代表特征点。
在另一个具体的实施例中,距离条件包括按照与平均特征点位置之间的距离升序排列时排序在排序阈值之前,那么服务器在计算得到各个图像特征点与所属特征点集合对应的平均特征点位置之间的距离后,可以按照距离对各个图像特征点进行升序排列,从排序在排序阈值之前的图像特征点中选择得到代表特征点,例如,排序阈值为2,则可以选择排序在第一位的图像特征点作为代表特征点。
上述实施例中,基于特征点集合中的各个图像特征点在所属图像中的位置,计算特征点集合对应的平均特征点位置,从特征点集合中确定与平均特征点位置之间的距离满足距离条件的图像特征点,将确定的图像特征点作为代表特征点,确定的代表特征点可以更好的体现特征点集合的整***置特性。
在一个实施例中,特征点集合包括多个;从特征点集合中确定代表特征点,包括:对于每一个特征点集合,在特征点集合满足过滤条件的情况下,对特征点集合进行过滤;在特征点集合不满足过滤条件的情况下,进入从特征点集合中确定代表特征点的步骤。
本实施例中,过滤条件包括以下至少一种:基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄相机之间的距离大于第一预设距离阈值;基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄设备之间的距离小于第二预设距离阈值,第二预设距离阈值小于第一预设距离阈值;基于特征点集合计算得到的视差大于预设视差阈值;基于特征点集合计算得到的平均重投影误差大于预设误差阈值。
其中,初始空间特征点指的是基于未更新的特征点集合中各个图像特征点在所属图像的位置确定的空间特征点。对特征点集合进行过滤即从多个特征点集合中去除该特征点集合。
具体地,针对多个特征点集合中每个特征点集合,服务器可以基于该所针对的特征点集合计算初始空间特征点,然后可以计算初始空间特征点与多帧图像的拍摄设备之间的距离,若距离大于第一预设距离阈值,即空间特征点距离拍摄设备太远时,过滤该所针对的特征点集合;若距离小于第二预设距离阈值,即空间特征点距离拍摄设备太近时,过滤该所针对的特征点集合,第二预设距离阈值小于第一预设距离阈值。
进一步,针对上一步过滤后剩余的每个特征点集合,服务器还可以基于所针对的特征点集合进行视差计算,若计算得到的视差大于预设视差阈值,则过滤掉该所针对的特征点集合。
进一步,针对上一步过滤后剩余的每个特征点集合,服务器还可以将基于特征点集合 计算得到的初始空间特征点投影至该特征点集合中各个图像特征点所属图像上,计算各个图像特征点与投影至各自所属图像上的投影特征点之间的距离,得到各个投影距离,然后计算投影距离的平均值,得到平均重投影误差,若平均重投影误差大于预设误差阈值,则过滤掉该特征点集合。
可以理解的是,在其他一些实施例中,过滤过程中的过滤条件还可以是上述几个条件中的一部分,并且按照各过滤条件进行过滤的顺序可以不局限于上文的顺序。
对于未被过滤的特征点集合,服务器可以进入上述步骤“从特征点集合中确定代表特征点”,以从这些特征点集合中确定代表特征点,从而通过上文实施例提供的方法对这些特征点集合的图像特征点进行位置优化,得到各个更新后的特征点集合,最终基于各个更新后的特征点集合中各个图像特征点在所属图像的位置,确定各个更新后的特征点集合对应的空间特征点,得到多个空间特征点从而生成特征地图。
上述实施例中,通过设置过滤条件,将满足过滤条件的特征点集合进行过滤,进一步提升了特征地图的鲁棒性,从而进一步提升了利用特征地图进行辅助定位时的定位精度。
在一个实施例中,如图4所示,基于空间特征点生成特征地图包括:
步骤402,基于更新后的特征点集合中各个图像特征点各自的特征描述子,确定更新后的特征点集合对应的平均描述子。
具体地,对于每一个更新后的特征点集合,服务器可以参考以下公式(2)计算得到该特征点集合对应的平均描述子:
其中,uj为平均描述子,j代表第j个特征点集合(更新后的特征点集合),f为第j个特征点集合中图像特征点的描述子,代表第j个特征点集合对应的特征描述子集合,RD代表D维实数空间。
步骤404,从更新后的特征点集合中各个图像特征点各自的特征描述子中,选择与平均描述子之间的相似度满足相似度条件的特征描述子,将选择到的特征描述子作为参考描述子。
其中,相似度条件可以是相似度大于预设相似度阈值或者按照相似度降序排列时排序在排序阈值之前中的其中一种。
在一个具体的实施例中,相似度条件包括相似度大于预设相似度阈值,那么对于每一个更新后的特征点集合,服务器在计算得到该特征点集合对应的平均描述子后,分别计算该特征点集合中各个图像特征点的特征描述子与该平均描述子之间的相似度,然后将各个相似度分别与相似度阈值进行比较,若只有一个图像特征点对应的相似度大于预设相似度阈值,则将该图像特征点的特征描述子确定为参考描述子,若是有多个图像特征点对应的相似度均大于预设相似度阈值,则可以在这些图像特征点对应的特征描述子中选择一个作为参考描述子,例如,可以选择相似度最大的特征描述子作为参考描述子。
在另一个具体的实施例中,距离条件包括按照相似度降序排列时排序在排序阈值之前, 那么对于每一个更新后的特征点集合,服务器在计算得到该特征点集合中各个图像特征点的特征描述子与该平均描述子之间的相似度后,可以按照相似度对各个图像特征点的特征描述子进行降序排列,从排序在排序阈值之前的特征描述子中选择得到参考描述子,例如,排序阈值为2,则可以选择排序在第一位的特征描述子作为参考描述子。
在另一个具体的实施例中,服务器可以参考以下公式(3)计算得到参考描述子:
其中,fj为参考描述子,j代表第j个特征点集合(更新后的特征点集合),uj为平均描述子,f代表第j个特征点集合中图像特征点的特征描述子,代表第j个特征点集合对应的特征描述子集合。
步骤406,将空间特征点投影至更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于投影特征点在所属图像上的位置确定投影特征点对应的特征描述子。
步骤408,基于投影特征点对应的特征描述子和参考描述子之间的差异,确定投影特征点对应的重投影误差。
步骤410,统计各个投影特征点各自对应的重投影误差得到目标误差,基于目标误差迭代更新空间特征点,当满足迭代停止条件时,得到更新后的特征点集合对应的目标空间特征点,基于目标空间特征点生成特征地图。
具体地,对于每一个更新后的特征点集合,服务器在确定了其对应的空间特征点后,可以将该空间特征点投影至该特征点集合中各个图像特征点所属图像上,得到该空间特征点对应的多个投影特征点,进一步基于各个投影特征点在所属图像上的位置可以确定各个投影特征点各自对应的特征描述子,然后分别计算各个投影特征点和步骤404中计算得到的该更新后的特征点集合对应的参考描述子之间的差异,得到各个投影特征点各自对应的重投影误差,最后统计各个重投影误差得到该更新后的特征点集合对应的目标误差,朝着最小化该目标误差的方向迭代更新该更新后的特征点集合对应的空间特征点,即将更新得到的空间特征点作为当前空间特征点,再次进入步骤406中,不断重复步骤406-步骤410,直至满足迭代停止条件时,得到的空间特征点即为目标空间特征点,进而可以基于目标空间特征点生成特征地图。其中,迭代停止条件可以是目标误差达到最小值、迭代次数达到预设次数或者迭代时长达到预设时长中的其中一种。
在一个具体的实施例中,服务器在执行上述步骤406至步骤410时可以参考以下公式(4)计算得到目标误差:
其中,为目标误差,j为第j个特征点集合(更新后的特征点集合),Z(j)表示第j个特征点集合中各个图像特征点所属图像的集合,i表示第i帧图像,Ci表示第i帧图 像对应的相机内参,Pj指的是第j个特征点集合对应的空间特征点,Ri为第i帧图像对应的旋转矩阵,ti为第i帧图像对应的平移矩阵,fj为第j个特征点集合对应的参考描述子。
上述实施例中,通过确定了参考描述子,将空间特征点投影至更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于投影特征点在所属图像上的位置确定投影特征点对应的特征描述子,基于投影特征点对应的特征描述子和参考描述子之间的差异,确定投影特征点对应的重投影误差,统计各个投影特征点各自对应的重投影误差得到目标误差,基于目标误差迭代更新空间特征点,当满足迭代停止条件时,得到目标空间特征点,实现了对空间特征点的位置优化,基于优化后得到的目标空间特征点生成的特征地图在用于定位时,可以进一步提升定位精度。
在一个实施例中,多帧图像是由安装于目标运动设备上的摄像头拍摄的;上述特征地图生成方法还包括:获取目标运动设备在拍摄多帧图像时的惯性测量数据和速度测量数据,利用惯性测量数据和速度测量数据,计算待定位的运动设备的初始位姿;基于惯性测量数据确定预积分信息,基于预积分信息和速度测量数据构建因子图,基于因子图对初始位姿进行调整,得到目标位姿;基于空间特征点生成特征地图,包括:建立空间特征点和目标位姿之间的对应关系,基于对应关系和空间特征点生成特征地图。
在一个实施例中,从各帧图像上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子,包括:将图像输入已训练的特征提取模型中,通过特征提取模型输出图像特征点对应的第一张量以及特征描述子对应的第二张量;第一张量用于描述图像中各个区域出现特征点的可能性;基于第一张量对图像进行非极大值抑制处理,以从图像中确定图像的图像特征点;将第二张量转换为与图像尺寸一致的第三张量,将第三张量中与图像特征点在所属图像中的位置匹配处的向量确定为图像特征点对应的描述子。
具体地,服务器将图像输入已训练的特征提取模型中,通过特征提取模型输出图像特征点对应的第一张量以及特征描述子对应的第二张量,该第一张量和第二张量均为多通道的张量,并且在每一个通道上的尺寸都是小于原始的输入图像的,其中第一张量中各个位置的值用于描述原始的输入图像中各个对应区域出现特征点的可能性,即概率值。举例说明,假设输入特征提取模型的图像尺寸为HxW,则输出的第一张量可以为H/N1 x W/N1 x X1,第二张量可以为H/N2 x W/N2 x X2,其中N1、N2、X1以及X2均为大于1的正整数。
在一个实施例中,在基于第一张量对图像进行非极大值抑制处理时,服务器可以首先将第一张量转换成与输入图像尺寸相同的概率图,在该概率图中搜索局部最大值,将局部最大值所在的位置确定为目标位置,由于该概率图和输入图像的尺寸是一致的,从而可以直接将输入图像中与该目标位置相同位置处的像素点确定为输入图像的图像特征点。
在另一个实施例中,考虑到将第一张量转换成与输入图像尺寸相同的概率图这一过程 比较耗时,在基于第一张量对图像进行非极大值抑制处理时,服务器可以通过以下步骤实现:
1、沿多个通道的方向获取第一张量在各个位置处的最大值以及各个最大值对应的通道索引,分别得到第三张量和第四张量。
具体地,假设第一张量包括N(N大于或者等于2)个通道,则对于第一张量中的每一个像素位置,服务器可以沿N个通道的方向搜索最大值,将各个像素位置处搜索到的最大值作为第三张量中对应位置处的数值,从而可以得到第三张量,同时将各个像素位置处搜索到的最大值所在通道索引作为第三张量中对应位置处的数值,从而可以得到第四张量。
2、从第三张量中确定目标数值,并对第三张量中目标数值所在位置的邻域进行搜索,目标数值所在位置的邻域包括多个目标位置,目标位置在图像中的对应位置,与目标数值所在位置在图像中的对应位置之间的图像距离小于预设距离阈值。
具体地,服务器可以将第三张量中各个数值按照从小到大进行排序,得到数值集合,然后依次遍历数值集合中的各个数值,对于遍历到的数值,判断是否小于预设阈值,如果小于预设阈值,则继续遍历下一个数值,如果大于预设阈值,则将遍历到的数值确定为目标数值,从而第三张量中目标数值所在位置的邻域进行搜索。由于第三张量相对于原始的输入图像尺寸是减小的,而图像特征点指的是输入图像中的像素点,因此目标数值所在位置的邻域需要根据目标数值在第三张量中的像素位置在原始的输入图像中的对应位置进行确定,也就是说,如果目标数值所在位置的邻域包括多个目标位置,那么其中每一个目标位置在输入图像中的对应位置,与目标数值所在位置在图像中的对应位置之间的图像距离小于预设距离阈值,即每一个目标位置在输入图像中的对应位置落在目标数值所在位置在图像中的对应位置的邻域范围内。举例说明,如图5所示,假设目标数值所在位置为A点,A点在输入图像中的对应位置为B点,如果图5中的虚线框代表B点的邻域,则A点在第三张量中的邻域内的每个目标位置在输入图像中的对应位置都落在该虚线框内。
在一个实施例中,考虑到第一张量中不同的通道提取的特征是不相同的,那么第三张量中的像素位置在原始图像中的对应位置与该像素位置所在的通道是相关,对于第三张量中的像素位置(i,j),从第四张量中对应位置处确定索引值为D[i,j],则其在原始图像中的对应位置为(N x i+D[i,j]/8,Nx j+D[i,j]%8),其中N为第三张量相对于原始输入图像的缩小比例。举例说明,假设原始的输入图像是640x480,第一张量是80x60x64,第二张量是80x60x256,第三张量是80x60(每个数值表示第一张量64维度最大值,小数类型),D是80x60(每个数值表示第一张量64维度最大值对应索引,整数类型),第一张量64维度对应原图每8x8区域,那么第一张量的一个坐标(32,53,35)对应原图上坐标是(32x8+35/8,53x8+35%8)=(260,427)。
因此,可以通过计算两个像素位置在第四张量中对应位置之间的距离,作为这两个像素位置在原始的输入图像中对应位置之间的距离。举例说明,对于第三张量中某个像素位 置(i,j)和另外一个像素位置(i+n,j+n),这两个像素位置在原始图像上对应位置之间的距离可以通过计算第四张量中像素位置(i,j)和像素位置(i+n,j+n)之间的距离得到。
3、在搜索结果指示目标数值大于邻域内的其他位置对应的数值的情况下,将图像中与目标数值所在位置相对应的目标像素点确定为图像的图像特征点。
其中,目标像素点是基于目标数值所在位置及对应的通道索引值从图像中确定的,通道索引值是基于目标数值所在位置从第四张量中确定。举例说明,假设第三张量中某个目标数值所在像素位置坐标为(i,j),那么该像素位置在第四张量中的对应位置同样为(i,j),假设第四张量中该位置处的数值为D[i,j],则在搜索结果指示目标数值大于邻域内的其他位置对应的数值的情况下,将原始的输入图像中坐标为(N x i+D[i,j]/8,Nx j+D[i,j]%8)的像素点确定为与目标数值所在位置相对应的目标像素点,其中N为第三张量相对于原始输入图像的缩小比例。
在一个具体的实施例中,上述实施例中的特征提取模型的具体结构可以如图6所示,其中,第一卷积块为3*3的全卷积层,步长stride=1,输出通道为64;第一池化块为3*3的最大池化层,步长stride=1,输出通道为64;第二卷积块为3*3的全卷积层,步长stride=2,输出通道为64;第三卷积块为3*3的全卷积层,步长stride=1,输出通道为64;第四卷积块为3*3的全卷积层,步长stride=2,输出通道为64;第五卷积块为3*3的全卷积层,步长stride=1,输出通道为64;第六卷积块为3*3的全卷积层,步长stride=2,输出通道为128;第七卷积块为3*3的全卷积层,步长stride=1,输出通道为128;第八卷积块为3*3的全卷积层,步长stride=1,输出通道为128;第九卷积块为3*3的全卷积层,步长stride=1,输出通道为128;第十卷积块为1*1的全卷积层,步长stride=2,输出通道为64;第十一卷积块为1*1的全卷积层,步长stride=2,输出通道为64;第十二卷积块为1*1的全卷积层,步长stride=2,输出通道为128;第十三卷积块为1*1的全卷积层,步长stride=2,输出通道为128;第十四卷积块为3*3的全卷积层,步长stride=1,输出通道为128;第十五卷积块为3*3的全卷积层,步长stride=1,输出通道为64;第十六卷积块为3*3的全卷积层,步长stride=1,输出通道为128;第十七卷积块为3*3的全卷积层,步长stride=1,输出通道为256。
假设输入图像维度为HxW,特征提取模型的第十五卷积块输出是特征点的张量A,维度是H/8xW/8x64,右边输出的是描述子的张量B,维度是H/8xW/8x256。则提取特征点和描述子的具体步骤如下:
1、沿着64通道维度,获取最大值以及最大值对应的索引,由此可得到两个H/8xW/8的张量C和D。
2、对张量C中的概率值做从大到小排列为集合E,设定目标集合F,用于存放特征点下标和置信度。
3、遍历集合E,并获取对应值在张量D中的下标i和j。
4、如果C[i,j]小于一定阀值(例如可以是0.05)跳过。
5、对C[i,j]的邻域n进行遍历。
6、计算D[i+n,j+n](或D[i-n,j-n])与D[i,j]的距离,也就是原图上坐标(8x(i+n)+D[i+n,j+n]/8,8x(j+n)+D[i+n,j+n]%8)和坐标(8 x i+D[i,j]/8,8x j+D[i,j]%8)之间的距离,如果大于一定阀值则跳过。
7、如果C[i+n,j+n](或C[i-n,j-n])大于C[i,j],则退出步骤5的遍历,否则继续执行步骤5。
8、如果步骤5执行完,且C[i,j]大于任意的C[i+n,j+n](或C[i-n,j-n]),则将C[i,j]和(ix8+D[i,j]/8,jx8+D[i,j]%8)放在目标集合F中。
9、继续执行步骤3。
10、对张量B进行双线性插值,得到张量G,维度为HxWx256,并且沿着通道方向进行L2范数计算。
11、根据目标集合F的结果去张量G中寻找对应的描述子,即对于目标集合F中的每一个图像特征点的下标,从张量G中找到与之下标相同的位置,将该位置处各个通道的值所组成的向量作为该图像特征点的特征描述子,则特征描述子为256维的向量。例如,对于目标集合F中的某个图像特征点的下标(10,13),从张量G中找到与(10,13)所在位置,将该位置处各个通道的值所组成的向量确定为该图像特征点的特征描述子。
上述实施例中,由于不需要将第一张量转换成与输入图像尺寸相同的概率图,提高了图像特征点的提取效率。
在一个实施例中,获得针对目标场景拍摄得到的多帧图像,包括:获得由鱼眼相机拍摄的针对目标场景的多帧原始图像,对多帧原始图像进行畸变矫正,得到针对目标场景拍摄得到的多帧图像。
本实施例中,服务器获得的针对目标场景拍摄得到的多帧图像时通过鱼眼相机拍摄得到的,鱼眼相机成像模型近似为单位球面投影模型。一般将鱼眼相机成像过程分解成两步:先将三维空间点线性的投影到虚拟单位球面上;随后将单位球面上的点投影到图像平面上,这个过程是非线性的。鱼眼相机的设计引入了畸变,因此鱼眼相机所成影像存在畸变,其中径向畸变非常严重,因此其畸变模型主要考虑径向畸变。鱼眼相机的投影函数是为了尽可能将庞大的场景投影到有限的图像平面所设计的。根据投影函数的不同将鱼眼相机的设计模型大致分为等距投影模型、等立体角投影模型、正交投影模型和体视投影模型四种。本申请实施例中,可以采用这四种模型中的任意一种对鱼眼相机拍摄得到的多帧原始图像进行畸变矫正,得到针对目标场景拍摄得到的多帧图像。
上述实施例中,由于多帧图像是由鱼眼相机拍摄得到的,鱼眼相机相较于针孔相机视角更大,可以感知更多的环境信息,提取更多的图像特征点,从而进一步提升所生成的特征地图的鲁棒性,进而提升定位精度。
在一个实施例中,如图7所示,为应用本申请实施例生成的特征地图进行定位信息确定的流程示意图,包括以下步骤:
步骤702,获取待定位的运动设备的惯性测量数据、速度测量数据以及运动设备在目标场景中拍摄得到的目标图像,利用惯性测量数据和速度测量数据确定待定位的运动设备的初始位姿。
其中,惯性测量数据可以是通过(Inertial Measurement Unit,IMU)测量得到的数据。速度测量数据可以是通过速度传感器测量得到数据,例如,当待定位的运动设备为车辆时,速度测量数据可以是通过轮速计测量得到的数据。这里的惯性测量数据以及速度测量数据均是在待定位的运动设备在目标场景中运动时测量得到的数据。
具体地,服务器可以接收待定位的运动设备发送的惯性测量数据、速度测量数据以及待定位的运动设备在目标场景中拍摄得到的目标图像,基于预设的运动学模型,利用惯性测量数据和速度测量数据计算待定位的运动设备的初始位姿。其中,预设的运动学模型可以反映车辆位置、速度、加速度等与时间的关系,本实施例对该模型的具体形式不作限定,在实际应用中,可以根据需求进行合理设置,例如可以在现有的单车模型上进行改进,得到所需模型。
步骤704,基于初始位姿从已生成的特征地图中确定位置匹配的空间特征点,得到目标空间特征点。
在一个实施例中,服务器可以根据初始位姿所表征的位置从特征地图中找到位置匹配的空间特征点,作为目标空间特征点。在其他实施例中,特征地图中还包括保存各个空间特征点对应的位姿,空间特征点对应的位姿可以是在生成特征地图的过程中,运动设备在拍摄多帧图像时的位姿,进而在确定定位信息的过程中,服务器可以将待定位的运动设备的初始位姿和各个空间特征点各自对应的位姿进行比对,将匹配度最高的位姿对应的空间特征点确定为目标特征点。
步骤706,从目标图像上确定与目标空间特征点匹配的图像特征点,将确定的图像特征点与目标空间特征点组成匹配对,根据匹配对确定运动设备的定位信息。
具体地,服务器可以将目标空间特征点对应的描述子和目标图像上各个图像特征点各自对应的特征描述子进行比对,将相似度最高的特征描述子对应的图像特征点确定为与目标空间特征点匹配的图像特征点,将确定的图像特征点与目标空间特征点组成匹配对,进而可以根据匹配对确定运动设备的定位信息。其中,目标空间特征点对应的描述子可以是目标空间特征点对应的特征点集合中各个图像特征点的特征描述子的平均值。
在一个实施例中,基于匹配对确定定位信息具体可以采用PnP算法,PnP算法是一种现有方法,此处不作详细阐述。在另一个实施例中,基于匹配对确定定位信息具体包括:将匹配对中的空间特征点投影到目标图像上,得到投影特征点,通过投影特征点和匹配对中的图像特征点,计算重投影误差,将重投影误差的最小二乘函数取值最小时对应的位姿, 确定为修正位姿,通过所述修正位姿对所述初始位姿进行修正,得到定位信息。进一步地,服务器可以将定位信息返回至待定位的运动设备。
上述实施例中,由于在生成特征地图的过程中,基于图像特征点的特征描述子对图像特征点进行了位置优化,生成的特征地图更加鲁棒,从而利用该特征地图在定位的过程中定位精度得到了大大提升。
在一个具体的实施例中,本申请的特征地图生成方法可以应用于泊车应用场景中,具体包括以下步骤:
一、服务器生成特征地图
1、获得由鱼眼相机拍摄的针对目标场景的多帧原始图像,对多帧原始图像进行畸变矫正,得到针对目标场景拍摄得到的多帧图像。
具体地,安装有鱼眼相机的目标车辆可以在车库中行驶,通过鱼眼相机对车库中的环境进行拍摄,得到多帧原始图像,并发送至服务器,服务器对多帧原始图像进行畸变矫正,得到针对目标场景拍摄得到的多帧图像。
需要说明的是,这里的目标车辆与需要泊车的车辆可以是相同的车辆,也可以是不同的车辆。
2、从各帧图像上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子。
具体地,对于每一帧图像,服务器可以将该图像输入已训练的特征提取模型中,通过特征提取模型输出图像特征点对应的第一张量以及特征描述子对应的第二张量;第一张量用于描述图像中各个区域出现特征点的可能性;基于第一张量对图像进行非极大值抑制处理,以从图像中确定图像的图像特征点;将第二张量转换为与图像尺寸一致的第三张量,将第三张量中与图像特征点在所属图像中的位置匹配处的向量确定为图像特征点对应的描述子。
其中,第一张量包括多个通道,基于第一张量对图像进行非极大值抑制处理,以从图像中确定图像的图像特征点,包括:沿多个通道的方向获取第一张量在各个位置处的最大值以及各个最大值对应的通道索引,分别得到第三张量和第四张量;从第三张量中确定目标数值,并对第三张量中目标数值所在位置的邻域进行搜索,目标数值所在位置的邻域包括多个目标位置,目标位置在图像中的对应位置,与目标数值所在位置在图像中的对应位置之间的图像距离小于预设距离阈值;在搜索结果指示目标数值大于邻域内的其他位置对应的数值的情况下,将图像中与目标数值所在位置相对应的目标像素点确定为图像的图像特征点;目标像素点是基于目标数值所在位置及对应的通道索引值从图像中确定的,通道索引值是基于目标数值所在位置从第四张量中确定。
3、将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合。
4、对于每一个特征点集合,在该特征点集合满足过滤条件的情况下,对特征点集合进 行过滤;在该特征点集合不满足过滤条件的情况下,进入步骤5。其中,过滤条件包括以下至少一种:基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄设备之间的距离大于第一预设距离阈值;基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄设备之间的距离小于第二预设距离阈值,第二预设距离阈值小于第一预设距离阈值;基于特征点集合计算得到的视差大于预设视差阈值;基于特征点集合计算得到的平均重投影误差大于预设误差阈值。
5、从该特征点集合中确定代表特征点,计算该特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异。
具体地,服务器通过以下步骤从特征点集合中确定代表特征点:基于特征点集合中的各个图像特征点在所属图像中的位置,计算特征点集合对应的平均特征点位置;从特征点集合中确定与平均特征点位置之间的距离满足距离条件的图像特征点,将确定的图像特征点作为代表特征点;其中,距离条件包括与平均特征点位置之间的距离小于或者距离阈值,或者按照与平均特征点位置之间的距离升序排列时排序在排序阈值之前中的其中一种。
6、将该特征点集合中各个剩余的图像特征点分别作为目标特征点,分别计算各个目标特征点和代表特征点之间的匹配置信度,基于各个目标特征点各自对应的匹配置信度和差异,计算各个目标特征点各自对应的位置误差,统计各个目标特征点各自对应的位置误差,得到该特征点集合的位置误差。
其中,分别计算各个目标特征点和代表特征点之间的匹配置信度,包括:分别获取各个目标特征点各自的特征描述子,并获取代表特征点的特征描述子;分别计算各个目标特征点各自的特征描述子和代表特征点的特征描述子之间的向量相似度,将各个向量相似度作为各自对应的目标特征点与代表特征点之间的匹配置信度。
7、基于位置误差迭代更新该特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合。
具体地,服务器可以朝着最小化位置误差的方向,采用梯度下降算法更新该特征点集合中剩余的图像特征点的位置,从第三张量中确定所得到的图像特征点对应的描述子,然后重新计算位置误差,不断地重复该过程,直至满足迭代停止条件。
通过以上步骤,可以得到多个更新后的特征点集合,服务器可以再次判断这些特征点集合中是否存在满足过滤条件的特征点集合,对于满足过滤条件的特征点集合进行过滤,对过滤后剩下的特征点集合,继续执行后续步骤。过滤条件可以参考上文实施例中的描述。
8、基于更新后的特征点集合中各个图像特征点在所属图像的位置,确定更新后的特征点集合对应的空间特征点,从而可以得到多个空间特征点。
9、对每一个空间特征点进行位置优化,具体包括以下步骤:
9.1、对于每一个空间特征点,基于该空间特征点对应的更新后的特征点集合中各个图像特征点各自的特征描述子,确定更新后的特征点集合对应的平均描述子。
9.2、从更新后的特征点集合中各个图像特征点各自的特征描述子中,选择与平均描述子之间的相似度满足相似度条件的特征描述子,将选择到的特征描述子作为参考描述子。
9.3、将空间特征点投影至更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于投影特征点在所属图像上的位置确定投影特征点对应的特征描述子。
9.4、基于投影特征点对应的特征描述子和参考描述子之间的差异,确定投影特征点对应的重投影误差。
9.5、统计各个投影特征点各自对应的重投影误差得到目标误差,基于目标误差迭代更新空间特征点,当满足迭代停止条件时,得到目标空间特征点,该目标特征点即为位置优化后的空间特征点。
10、基于优化后得到的各个目标空间特征点生成特征地图,保存该特征地图。
二、基于特征地图进行泊车
1、需要泊车的车辆在驶入车库入口时,可以从服务器下载该特征地图,用户可以输入所要泊车的目标车位位置,从而车辆可以基于特征地图为用户规划从车库入口至目标车位位置的泊车路线。
2、车辆按照规划的泊车路线自动行驶,在行驶过程中,车辆按照以下方式进行定位:
2.1、通过IMU获得当前惯性测量数据、通过轮速传感器获得当前速度测量数据以及通过安装于该车辆上的摄像机拍摄得到当前目标图像。
2.2、利用惯性测量数据和速度测量数据确定当前初始位姿。
2.3、基于当前初始位姿从保存的特征地图中确定位置匹配的空间特征点,得到目标空间特征点。
2.4、从目标图像上确定与目标空间特征点匹配的图像特征点,将确定的图像特征点与目标空间特征点组成匹配对,根据匹配对确定当前位置。
3、当当前位置达到目标车位位置处时,自动驶入该目标车位位置处,完成泊车。
在另一个具体的实施例中,本申请的特征地图生成方法可以应用于扫地机器人自动清扫的应用场景中,在该应用场景中,扫地机器人首先在需要清扫的区域内行走,采集该区域内的多帧图像,按照本申请实施例提供的特征地图生成方法生成特征地图,进而,在后续的自动清扫过程中,可以通过特征地图规划清扫路线,在自动清扫的过程中,基于特征地图进行自动定位,以按照规划的清扫路线执行清扫任务。
应该理解的是,虽然如上的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者 阶段的至少一部分轮流或者交替地执行。
基于同样的发明构思,本申请实施例还提供了一种用于实现上述所涉及的特征地图生成方法的特征地图生成装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似,故下面所提供的一个或多个特征地图生成装置、定位信息确定装置实施例中的具体限定可以参见上文中对于特征地图生成方法的限定,在此不再赘述。
在一个实施例中,如图8所示,提供了一种特征地图生成装置800,包括:
特征提取模块802,用于获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子;
特征点集合确定模块804,用于将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合;
差异计算模块806,从特征点集合中确定代表特征点,计算特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异;
位置更新模块808,用于基于计算得到的差异确定特征点集合的位置误差,基于位置误差迭代更新特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;
特征地图生成模块810,用于基于更新后的特征点集合中各个图像特征点在所属图像的位置,确定更新后的特征点集合对应的空间特征点,基于空间特征点生成特征地图,特征地图用于对待定位的运动设备在目标场景中进行定位。
上述特征地图生成装置,通过获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的图像特征点在所属图像的位置确定对应的特征描述子,将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合,从特征点集合中确定代表特征点,计算特征点集合中剩余的图像特征点对应的特征描述子与代表特征点对应的特征描述子之间的差异,基于计算得到的差异确定特征点集合的位置误差,基于位置误差迭代更新特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;基于更新后的特征点集合中各个图像特征点在所属图像的位置,确定更新后的特征点集合对应的空间特征点,基于空间特征点生成特征地图,由于在生成特征地图的过程中,基于图像特征点的特征描述子对图像特征点进行了位置优化,可以使得生成的特征地图更加鲁棒,从而利用该特征地图在定位的过程中定位精度得到了大大提升。
在一个实施例中,位置更新模块808,用于将特征点集合中各个剩余的图像特征点分别作为目标特征点,分别计算各个目标特征点和代表特征点之间的匹配置信度;基于各个目标特征点各自对应的匹配置信度和差异,计算各个目标特征点各自对应的位置误差;统计各个目标特征点各自对应的位置误差,得到特征点集合的位置误差。
在一个实施例中,位置更新模块808,还用于分别获取各个目标特征点各自的特征描述子,并获取代表特征点的特征描述子;分别计算各个目标特征点各自的特征描述子和代 表特征点的特征描述子之间的向量相似度,将各个向量相似度作为各自对应的目标特征点与代表特征点之间的匹配置信度。
在一个实施例中,差异计算模块806,还用于基于特征点集合中的各个图像特征点在所属图像中的位置,计算特征点集合对应的平均特征点位置;从特征点集合中确定与平均特征点位置之间的距离满足距离条件的图像特征点,将确定的图像特征点作为代表特征点;其中,距离条件包括与平均特征点位置之间的距离小于或者等于距离阈值,或者按照与平均特征点位置之间的距离升序排列时排序在排序阈值之前中的其中一种。
在一个实施例中,特征点集合包括多个;差异计算模块806,还用于对于每一个特征点集合,在特征点集合满足过滤条件的情况下,对特征点集合进行过滤;在特征点集合不满足过滤条件的情况下,进入从特征点集合中确定代表特征点的步骤;其中,过滤条件包括以下至少一种:基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄设备之间的距离大于第一预设距离阈值;基于特征点集合计算得到的初始空间特征点与多帧图像的拍摄设备之间的距离小于第二预设距离阈值,第二预设距离阈值小于第一预设距离阈值基于特征点集合计算得到的视差大于预设视差阈值;基于特征点集合计算得到的平均重投影误差大于预设误差阈值。
在一个实施例中,特征地图生成模块,还用于:基于更新后的特征点集合中各个图像特征点各自的特征描述子,确定更新后的特征点集合对应的平均描述子;从更新后的特征点集合中各个图像特征点各自的特征描述子中,选择与平均描述子之间的相似度满足相似度条件的特征描述子,将选择到的特征描述子作为参考描述子;将空间特征点投影至更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于投影特征点在所属图像上的位置确定投影特征点对应的特征描述子;基于投影特征点对应的特征描述子和参考描述子之间的差异,确定投影特征点对应的重投影误差;统计各个投影特征点各自对应的重投影误差得到目标误差,基于目标误差迭代更新空间特征点,当满足迭代停止条件时,得到更新后的特征点集合对应的目标空间特征点,基于目标空间特征点生成特征地图。
在一个实施例中,特征提取模块,还用于将图像输入已训练的特征提取模型中,通过特征提取模型输出图像特征点对应的第一张量以及特征描述子对应的第二张量;第一张量用于描述图像中各个区域出现特征点的可能性;基于第一张量对图像进行非极大值抑制处理,以从图像中确定图像的图像特征点;将第二张量转换为与图像尺寸一致的第三张量,将第三张量中与图像特征点在所属图像中的位置匹配处的向量确定为图像特征点对应的描述子。
在一个实施例中,第一张量包括多个通道,特征提取模块,还用于:沿多个通道的方向获取第一张量在各个位置处的最大值以及各个最大值对应的通道索引,分别得到第三张量和第四张量;从第三张量中确定目标数值,并对第三张量中目标数值所在位置的邻域进 行搜索,目标数值所在位置的邻域包括多个目标位置,目标位置在图像中的对应位置,与目标数值所在位置在图像中的对应位置之间的图像距离小于预设距离阈值;在搜索结果指示目标数值大于邻域内的其他位置对应的数值的情况下,将图像中与目标数值所在位置相对应的目标像素点确定为图像的图像特征点;目标像素点是基于目标数值所在位置及对应的通道索引值从图像中确定的,通道索引值是基于目标数值所在位置从第四张量中确定。
在一个实施例中,特征提取模块,还用于:获得由鱼眼相机拍摄的针对目标场景的多帧原始图像,对多帧原始图像进行畸变矫正,得到针对目标场景拍摄得到的多帧图像。
在一个实施例中,上述装置还包括:定位信息确定模块,用于初始获取待定位的运动设备的惯性测量数据、速度测量数据以及运动设备在目标场景中拍摄得到的目标图像,利用惯性测量数据和速度测量数据确定待定位的运动设备的初始位姿;基于初始位姿从已生成的特征地图中确定位置匹配的空间特征点,得到目标空间特征点;从目标图像上确定与目标空间特征点匹配的图像特征点,将确定的图像特征点与目标空间特征点组成匹配对,根据匹配对确定运动设备的定位信息。
上述特征地图生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output,简称I/O)和通信接口。其中,处理器、存储器和输入/输出接口通过***总线连接,通信接口通过输入/输出接口连接到***总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作***、计算机程序和数据库。该内存储器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于存储特征地图数据。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种特征地图生成方法。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是安装于上述运动设备内的终端,例如可以是车载终端,其内部结构图可以如图10所示。该计算机设备包括处理器、存储器、输入/输出接口、通信接口、显示单元和输入装置。其中,处理器、存储器和输入/输出接口通过***总线连接,通信接口、显示单元和输入装置通过输入/输出接口连接到***总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***和计算机程序。该内存储器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用 于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、移动蜂窝网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种特征地图生成方法。该计算机设备的显示单元用于形成视觉可见的画面,可以是显示屏、投影装置或虚拟现实成像装置,显示屏可以是液晶显示屏或电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图9、图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,该处理器执行该计算机可读指令时实现上述特征地图生成方法的步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述特征地图生成方法的步骤。
在一个实施例中,提供了一种计算机程序产品,包括计算机可读指令,该计算机可读指令被处理器执行时实现上述特征地图生成方法的步骤。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory,MRAM)、铁电存储器(Ferroelectric Random Access Memory,FRAM)、相变存储器(Phase Change Memory,PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器等。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等,不限于此。本申请所提供的各实施例中所涉及的处理器可为 通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等,不限于此。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种特征地图生成方法,由计算机设备执行,所述方法包括:
    获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的所述图像特征点在所属图像的位置确定对应的特征描述子;
    将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合;
    从所述特征点集合中确定代表特征点,计算所述特征点集合中剩余的图像特征点对应的特征描述子与所述代表特征点对应的特征描述子之间的差异;
    基于计算得到的差异确定所述特征点集合的位置误差,基于所述位置误差迭代更新所述特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;及
    基于所述更新后的特征点集合中各个图像特征点在所属图像的位置,确定所述更新后的特征点集合对应的空间特征点,基于所述空间特征点生成特征地图,所述特征地图用于对待定位的运动设备在所述目标场景中进行定位。
  2. 根据权利要求1所述的方法,其特征在于,所述基于计算得到的差异确定所述特征点集合的位置误差,包括:
    将所述特征点集合中各个剩余的图像特征点分别作为目标特征点,分别计算各个所述目标特征点和所述代表特征点之间的匹配置信度;
    基于各个所述目标特征点各自对应的匹配置信度和差异,计算各个所述目标特征点各自对应的位置误差;及
    统计各个所述目标特征点各自对应的位置误差,得到所述特征点集合的位置误差。
  3. 根据权利要求2所述的方法,其特征在于,所述分别计算各个所述目标特征点和所述代表特征点之间的匹配置信度,包括:
    分别获取各个所述目标特征点各自的特征描述子,并获取所述代表特征点的特征描述子;及
    分别计算各个所述目标特征点各自的特征描述子和所述代表特征点的特征描述子之间的向量相似度,将各个向量相似度作为各自对应的目标特征点与所述代表特征点之间的匹配置信度。
  4. 根据权利要求1所述的方法,其特征在于,所述从所述特征点集合中确定代表特征点,包括:
    基于所述特征点集合中的各个图像特征点在所属图像中的位置,计算所述特征点集合对应的平均特征点位置;及
    从所述特征点集合中确定与所述平均特征点位置之间的距离满足距离条件的图像特征点,将确定的图像特征点作为代表特征点;
    其中,所述距离条件包括与所述平均特征点位置之间的距离小于或者等于距离阈值,或者按照与所述平均特征点位置之间的距离升序排列时排序在排序阈值之前中的其中一种。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述至少一个特征点集合包括多个;所述从所述特征点集合中确定代表特征点,包括:
    对于每一个特征点集合,在所述特征点集合满足过滤条件的情况下,对所述特征点集合进行过滤;
    其中,所述过滤条件包括以下至少一种:
    基于所述特征点集合计算得到的初始空间特征点与所述多帧图像的拍摄设备之间的距离大于第一预设距离阈值;
    基于所述特征点集合计算得到的初始空间特征点与所述多帧图像的拍摄设备之间的距离小于第二预设距离阈值,所述第二预设距离阈值小于所述第一预设距离阈值;
    基于所述特征点集合计算得到的视差大于预设视差阈值;
    基于所述特征点集合计算得到的平均重投影误差大于预设误差阈值。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    在所述特征点集合不满足所述过滤条件的情况下,进入所述从所述特征点集合中确定代表特征点的步骤。
  7. 根据权利要求1所述的方法,其特征在于,所述基于所述空间特征点生成特征地图,包括:
    基于所述更新后的特征点集合中各个图像特征点各自的特征描述子,确定所述更新后的特征点集合对应的平均描述子;
    从所述更新后的特征点集合中各个图像特征点各自的特征描述子中,选择与所述平均描述子之间的相似度满足相似度条件的特征描述子,将选择到的特征描述子作为参考描述子;
    将所述空间特征点投影至所述更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于所述投影特征点在所属图像上的位置确定所述投影特征点对应的特征描述子;
    基于所述投影特征点对应的特征描述子和所述参考描述子之间的差异,确定所述投影特征点对应的重投影误差;及
    统计各个投影特征点各自对应的重投影误差得到目标误差,基于所述目标误差迭代更新所述空间特征点,当满足迭代停止条件时,得到所述更新后的特征点集合对应的目标空间特征点,基于所述目标空间特征点生成特征地图。
  8. 根据权利要求1所述的方法,其特征在于,所述从各帧图像上分别提取图像特征点,基于提取的所述图像特征点在所属图像的位置确定对应的特征描述子,包括:
    将所述图像输入已训练的特征提取模型中,通过所述特征提取模型输出图像特征点对应的第一张量以及特征描述子对应的第二张量;所述第一张量用于描述所述图像中各个区域出现特征点的可能性;
    基于所述第一张量对所述图像进行非极大值抑制处理,以从所述图像中确定所述图像的图像特征点;及
    将所述第二张量转换为与所述图像尺寸一致的第三张量,将所述第三张量中与所述图像特征点在所属图像中的位置匹配处的向量确定为所述图像特征点对应的描述子。
  9. 根据权利要求8所述的方法,其特征在于,所述第一张量包括多个通道,所述基于所述第一张量对所述图像进行非极大值抑制处理,以从所述图像中确定所述图像的图像特征点,包括:
    沿所述多个通道的方向获取所述第一张量在各个位置处的最大值以及各个最大值对应的通道索引,分别得到第三张量和第四张量;
    从所述第三张量中确定目标数值,并对所述第三张量中所述目标数值所在位置的邻域进行搜索,所述目标数值所在位置的邻域包括多个目标位置,所述目标位置在所述图像中的对应位置,与所述目标数值所在位置在所述图像中的对应位置之间的图像距离小于预设距离阈值;
    在搜索结果指示所述目标数值大于所述邻域内的其他位置对应的数值的情况下,将所述图像中与所述目标数值所在位置相对应的目标像素点确定为所述图像的图像特征点;及
    所述目标像素点是基于所述目标数值所在位置及对应的通道索引值从所述图像中确定的,所述通道索引值是基于所述目标数值所在位置从所述第四张量中确定。
  10. 根据权利要求1至9任意一项所述的方法,其特征在于,所述获得针对目标场景拍摄得到的多帧图像,包括:
    获得由鱼眼相机拍摄的针对目标场景的多帧原始图像,对所述多帧原始图像进行畸变矫正,得到所述针对目标场景拍摄得到的多帧图像。
  11. 根据权利要求1至9任意一项所述的方法,多帧图像是由安装于目标运动设备上的摄像头拍摄的;其特征在于,所述方法还包括:
    获取目标运动设备在拍摄多帧图像时的惯性测量数据和速度测量数据,利用所述惯性测量数据和所述速度测量数据,计算待定位的运动设备的初始位姿;
    基于所述惯性测量数据确定预积分信息,基于所述预积分信息和所述速度测量数据构建因子图,基于所述因子图对初始位姿进行调整,得到目标位姿;
    所述基于所述空间特征点生成特征地图,包括:
    建立所述空间特征点和所述目标位姿之间的对应关系,基于对应关系和空间特征点生成特征地图。
  12. 根据权利要求1至9任意一项所述的方法,其特征在于,所述方法包括:
    获取待定位的运动设备的惯性测量数据、速度测量数据以及所述运动设备在目标场景中拍摄得到的目标图像,利用所述惯性测量数据和所述速度测量数据确定所述待定位的运动设备的初始位姿;
    基于所述初始位姿从所述特征地图中确定位置匹配的空间特征点,得到目标空间特征点;及
    从所述目标图像上确定与所述目标空间特征点匹配的图像特征点,将确定的所述图像特征点与所述目标空间特征点组成匹配对,根据所述匹配对确定所述运动设备的定位信息。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述匹配对确定所述运动设备的定位信息,包括:
    将匹配对中的空间特征点投影到目标图像上,得到投影特征点;
    通过所述投影特征点和匹配对中的图像特征点,计算重投影误差;及
    将重投影误差的最小二乘函数取值最小时对应的位姿,确定为修正位姿,通过所述修正位姿对所述初始位姿进行修正,得到定位信息。
  14. 根据权利要求12所述的方法,其特征在于,所述待定位的运动设备包括待泊车的车辆或者扫地机器人。
  15. 一种特征地图生成装置,其特征在于,所述装置包括:
    特征提取模块,用于获得针对目标场景拍摄得到的多帧图像,从各帧图像上分别提取图像特征点,基于提取的所述图像特征点在所属图像的位置确定对应的特征描述子;
    特征点集合确定模块,用于将各帧图像的图像特征点中具有匹配关系的图像特征点,组成特征点集合;
    差异计算模块,从所述特征点集合中确定代表特征点,计算所述特征点集合中剩余的图像特征点对应的特征描述子与所述代表特征点对应的特征描述子之间的差异;
    位置更新模块,用于基于计算得到的差异确定所述特征点集合的位置误差,基于所述位置误差迭代更新所述特征点集合中剩余的图像特征点,当满足迭代停止条件时,得到更新后的特征点集合;及
    特征地图生成模块,用于基于所述更新后的特征点集合中各个图像特征点在所属图像的位置,确定所述更新后的特征点集合对应的空间特征点,基于所述空间特征点生成特征地图,所述特征地图用于对待定位的运动设备在所述目标场景中进行定位。
  16. 根据权利要求15所述的装置,其特征在于,所述特征地图生成模块还用于:
    基于所述更新后的特征点集合中各个图像特征点各自的特征描述子,确定所述更新后的特征点集合对应的平均描述子;
    从所述更新后的特征点集合中各个图像特征点各自的特征描述子中,选择与所述平均描述子之间的相似度满足相似度条件的特征描述子,将选择到的特征描述子作为参考描述 子;
    将所述空间特征点投影至所述更新后的特征点集合中各个图像特征点所属图像上,得到多个投影特征点,基于所述投影特征点在所属图像上的位置确定所述投影特征点对应的特征描述子;
    基于所述投影特征点对应的特征描述子和所述参考描述子之间的差异,确定所述投影特征点对应的重投影误差;及
    统计各个投影特征点各自对应的重投影误差得到目标误差,基于所述目标误差迭代更新所述空间特征点,当满足迭代停止条件时,得到所述更新后的特征点集合对应的目标空间特征点,基于所述目标空间特征点生成特征地图。
  17. 根据权利要求15所述的装置,其特征在于,所述位置更新模块,还用于将所述特征点集合中各个剩余的图像特征点分别作为目标特征点,分别计算各个所述目标特征点和所述代表特征点之间的匹配置信度;基于各个所述目标特征点各自对应的匹配置信度和差异,计算各个所述目标特征点各自对应的位置误差;统计各个所述目标特征点各自对应的位置误差,得到所述特征点集合的位置误差。
  18. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现权利要求1至14中任一项所述的方法的步骤。
  19. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至14中任一项所述的方法的步骤。
  20. 一种计算机程序产品,包括计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至14中任一项所述的方法的步骤。
PCT/CN2023/097112 2022-08-08 2023-05-30 特征地图生成方法、装置、存储介质和计算机设备 WO2024032101A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210945938.5 2022-08-08
CN202210945938.5A CN117576494A (zh) 2022-08-08 2022-08-08 特征地图生成方法、装置、存储介质和计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/633,850 Continuation US20240257501A1 (en) 2022-08-08 2024-04-12 Feature map generation method and apparatus, storage medium, and computer device

Publications (1)

Publication Number Publication Date
WO2024032101A1 true WO2024032101A1 (zh) 2024-02-15

Family

ID=89850552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097112 WO2024032101A1 (zh) 2022-08-08 2023-05-30 特征地图生成方法、装置、存储介质和计算机设备

Country Status (2)

Country Link
CN (1) CN117576494A (zh)
WO (1) WO2024032101A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117990058A (zh) * 2024-04-07 2024-05-07 国网浙江省电力有限公司宁波供电公司 一种提高rtk测量精度的方法、装置、计算机设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005487A1 (en) * 2018-06-28 2020-01-02 Ubtech Robotics Corp Ltd Positioning method and robot using the same
CN110648397A (zh) * 2019-09-18 2020-01-03 Oppo广东移动通信有限公司 场景地图生成方法、装置、存储介质及电子设备
WO2020108285A1 (zh) * 2018-11-30 2020-06-04 华为技术有限公司 地图构建方法、装置及***、存储介质
CN111780764A (zh) * 2020-06-30 2020-10-16 杭州海康机器人技术有限公司 一种基于视觉地图的视觉定位方法、装置
CN114565777A (zh) * 2022-02-28 2022-05-31 维沃移动通信有限公司 数据处理方法和装置
CN114674328A (zh) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 地图生成方法、装置、电子设备、存储介质、及车辆

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005487A1 (en) * 2018-06-28 2020-01-02 Ubtech Robotics Corp Ltd Positioning method and robot using the same
WO2020108285A1 (zh) * 2018-11-30 2020-06-04 华为技术有限公司 地图构建方法、装置及***、存储介质
CN110648397A (zh) * 2019-09-18 2020-01-03 Oppo广东移动通信有限公司 场景地图生成方法、装置、存储介质及电子设备
CN111780764A (zh) * 2020-06-30 2020-10-16 杭州海康机器人技术有限公司 一种基于视觉地图的视觉定位方法、装置
CN114565777A (zh) * 2022-02-28 2022-05-31 维沃移动通信有限公司 数据处理方法和装置
CN114674328A (zh) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 地图生成方法、装置、电子设备、存储介质、及车辆

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117990058A (zh) * 2024-04-07 2024-05-07 国网浙江省电力有限公司宁波供电公司 一种提高rtk测量精度的方法、装置、计算机设备及介质
CN117990058B (zh) * 2024-04-07 2024-06-11 国网浙江省电力有限公司宁波供电公司 一种提高rtk测量精度的方法、装置、计算机设备及介质

Also Published As

Publication number Publication date
CN117576494A (zh) 2024-02-20

Similar Documents

Publication Publication Date Title
Walch et al. Image-based localization using lstms for structured feature correlation
CN112102411B (zh) 一种基于语义误差图像的视觉定位方法及装置
CN108537876A (zh) 基于深度相机的三维重建方法、装置、设备及存储介质
CN113819890B (zh) 测距方法、装置、电子设备及存储介质
CN110070598B (zh) 用于3d扫描重建的移动终端及其进行3d扫描重建方法
CN110111388B (zh) 三维物***姿参数估计方法及视觉设备
CN110147382A (zh) 车道线更新方法、装置、设备、***及可读存储介质
WO2022183657A1 (zh) 点云模型构建方法、装置、电子设备、存储介质和程序
CN109325995B (zh) 基于人手参数模型的低分辨率多视角手部重建方法
CN111524168A (zh) 点云数据的配准方法、***、装置及计算机存储介质
WO2024032101A1 (zh) 特征地图生成方法、装置、存储介质和计算机设备
CN111144349A (zh) 一种室内视觉重定位方法及***
CN114219855A (zh) 点云法向量的估计方法、装置、计算机设备和存储介质
CN110889349A (zh) 一种基于vslam的稀疏三维点云图的视觉定位方法
CN115035235A (zh) 三维重建方法及装置
WO2022052782A1 (zh) 图像的处理方法及相关设备
CN116222577B (zh) 闭环检测方法、训练方法、***、电子设备及存储介质
CN111612898B (zh) 图像处理方法、装置、存储介质及电子设备
CN112802197A (zh) 动态场景下基于全卷积神经网络的视觉slam方法及***
CN113902802A (zh) 视觉定位方法及相关装置、电子设备和存储介质
CN112150518B (zh) 一种基于注意力机制的图像立体匹配方法及双目设备
CN111325828A (zh) 一种基于三目相机的三维人脸采集方法及装置
CN113592015B (zh) 定位以及训练特征匹配网络的方法和装置
CN117132737B (zh) 一种三维建筑模型构建方法、***及设备
CN117876608A (zh) 三维图像重建方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851329

Country of ref document: EP

Kind code of ref document: A1