WO2021083242A1 - 地图构建方法、定位方法及***、无线通信终端、计算机可读介质 - Google Patents

地图构建方法、定位方法及***、无线通信终端、计算机可读介质 Download PDF

Info

Publication number
WO2021083242A1
WO2021083242A1 PCT/CN2020/124547 CN2020124547W WO2021083242A1 WO 2021083242 A1 WO2021083242 A1 WO 2021083242A1 CN 2020124547 W CN2020124547 W CN 2020124547W WO 2021083242 A1 WO2021083242 A1 WO 2021083242A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
key frame
information
matching
Prior art date
Application number
PCT/CN2020/124547
Other languages
English (en)
French (fr)
Inventor
孙莹莹
金珂
尚太章
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP20880439.3A priority Critical patent/EP3975123A4/en
Publication of WO2021083242A1 publication Critical patent/WO2021083242A1/zh
Priority to US17/561,307 priority patent/US20220114750A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present application relate to the field of map construction and positioning technology, and more specifically, to a map construction method, a positioning method, a positioning system, a wireless communication terminal, and a computer-readable medium.
  • visual information can be used to build an environment map to assist users in perceiving the surrounding environment and quickly locate their own location.
  • the existing technology also has certain deficiencies in the process of constructing maps and positioning.
  • the prior art solution only considers traditional image features in the process of environmental mapping and image matching, and the traditional image features have poor anti-noise ability, resulting in a low positioning success rate.
  • the prior art mostly only considers the two-dimensional features of the visual image in the positioning process, and the positioning freedom is lacking; and there is also the problem of poor positioning robustness.
  • the embodiments of the present application provide a map construction method, a positioning method or system, a wireless communication terminal, and a computer readable medium, which can perform map construction and positioning based on the image characteristics of the collected environmental images.
  • a method for constructing a map includes: collecting an environment image of the current environment; acquiring image feature information of the environment image, and performing feature point matching on the continuous environment image according to the image feature information To filter key frame images; wherein the image feature information includes feature point information and corresponding descriptor information; obtain depth information corresponding to the matched feature points in the key frame image to construct the three-dimensional feature of the key frame image Information; constructing map data of the current environment based on the key frame image; wherein the map data includes image feature information and three-dimensional feature information corresponding to the key frame image.
  • a positioning method comprising: in response to a positioning instruction, collecting a target image; extracting image characteristic information of the target image; wherein the image characteristic information includes characteristic points of the target image Information and descriptive sub-information; matching the image feature information with each key frame image in the map data to determine a matching frame image; generating the current positioning result corresponding to the target image according to the matching frame image.
  • a positioning system in a third aspect, includes: a positioning instruction response module for collecting a target image in response to a positioning instruction; an image feature recognition module for extracting image feature information of the target image; Wherein, the image feature information includes feature point information and descriptive sub-information of the target image; a matching frame screening module is used to match the image feature information with each key frame image in the map data to determine a matching frame image; The positioning result generating module is configured to generate the current positioning result corresponding to the target image according to the matched frame image.
  • a computer-readable medium for storing computer software instructions used to execute the method in the first aspect or the second aspect, and the computer software instructions include the programs designed to execute the above aspects.
  • a wireless communication terminal including: one or more processors; a storage device, configured to store one or more programs, when the one or more programs are used by the one or more processors When executed, the one or more processors are caused to execute the method in the first aspect or the second aspect.
  • Fig. 1 shows a schematic diagram of a map construction method according to an embodiment of the present application.
  • Fig. 2 shows a schematic flowchart of another map construction method according to an embodiment of the present application.
  • FIG. 3 shows a schematic flowchart of a positioning method according to an embodiment of the present application.
  • FIG. 4 shows a schematic flowchart of another positioning method according to an embodiment of the present application.
  • Fig. 5 shows a schematic diagram of a key frame image matching result according to an embodiment of the present application.
  • Fig. 6 shows a schematic diagram of the solution principle of the PnP model in an embodiment of the present application.
  • Fig. 7 shows a schematic block diagram of a positioning system according to an embodiment of the present application.
  • Fig. 8 shows a schematic block diagram of a map construction device according to an embodiment of the present application.
  • Fig. 9 shows a schematic block diagram of a computer system of a wireless communication terminal according to an embodiment of the present application.
  • Fig. 1 shows a schematic diagram of a map construction method according to an embodiment of the present application. As shown in Figure 1, the method includes some or all of the following:
  • S14 Constructing map data of the current environment based on the key frame image; wherein the map data includes image feature information and three-dimensional feature information corresponding to the key frame image.
  • a monocular camera when building a map of an indoor or outdoor environment, can be used to continuously collect environmental images at a certain frequency.
  • the collected environmental image may be an RGB environmental image.
  • the monocular camera in the current environment, can be controlled to collect environmental images at a frequency of 10-20 frames per second and move at a certain speed to obtain all environmental images corresponding to the current environment.
  • the trained super point (Super Point)-based feature extraction model can be used to perform feature extraction on the environment image in real time, so as to obtain the image features of each environment image.
  • the image feature information includes feature point information and corresponding descriptor information.
  • acquiring the image feature information of the environmental image by using the trained SuperPoint-based feature extraction model may include:
  • S1211 Use an encoder to encode the environment image to obtain feature encoding
  • S1212 Input the feature code into an interest point encoder to obtain the feature point information corresponding to the environment image;
  • S1213 Input the feature code into a descriptor encoder to obtain the descriptor information corresponding to the environment image.
  • the feature extraction model based on SuperPoint may include an encoding part and a decoding part.
  • the input image can be a full-size image, and the dimensionality of the input image is reduced by an encoder to obtain a reduced-dimensional feature map, that is, feature encoding.
  • a point-of-interest encoder and a descriptor encoder can be included.
  • Use the point of interest encoder to decode the feature code and output the feature point information with the same size as the environment image; and use the descriptor encoder to decode the feature code, output the descriptor information corresponding to the feature point information, and use the descriptor information to describe Image features corresponding to feature points, such as color, contour and other information.
  • the SuperPoint-based feature extraction model may be trained in advance, and the training process may include the following steps:
  • S22 Perform random homography transformation on each original image in the MS-COCO data set to obtain the transformed image corresponding to each original image, and use the trained Magic Point model to perform feature extraction on the transformed image to obtain each original image in the MS-COCO data set. The true value of the feature points of the original image.
  • a synthetic database containing multiple synthetic shapes can be constructed; synthetic shapes can include simple two-dimensional figures, such as quadrilaterals, triangles, lines, and ellipses. And define the key point positions of each two-dimensional image as the Y knot, L knot, T knot of each two-dimensional graph, and the center and line segment division of the ellipse. These key points can be regarded as a subset of the points of interest in the real world.
  • Use the above-mentioned synthetic database as the training sample to train the Magic Point (Magic Point) model.
  • the Magic Point model is used to extract the feature points of the basic shape elements.
  • n random homography transformations can be performed on each original image in the MS-COCO (Microsoft-Common Objects in Context) data set to obtain n corresponding transformed images.
  • the above-mentioned homography transformation refers to a transformation matrix from one image to another image. The points of the same color in the two pictures are called correlation points. Take each original image in the MS-COCO data set as the input image, and use a random transformation matrix to transform the input image to obtain a transformed image.
  • random homography can be a combination of simple transformations.
  • the trained Magic Point model is used to extract feature points of each transformed image to obtain n feature point heat maps corresponding to each original image.
  • the n feature point heat maps are accumulated, and the output images of the same source image are combined to obtain the final feature point accumulated map.
  • the preset threshold is used to screen the feature points at each position in the feature point accumulation map, and the feature points with strong changes are screened out, and the filtered feature points are used to describe the shape in the original image. And use the filtered feature points as the true value of each feature point as the feature point label, and as the sample data for subsequent SuperPoint model training.
  • the input parameter may be a full-size image
  • one encoder is used to reduce the dimensionality of the input image
  • two decoders are used to output feature point information and descriptor information respectively.
  • the encoder can use a typical VGG structure, using three consecutive maximum pooling layers to perform continuous pooling operations on the input image, and one convolutional layer to perform a convolution operation to reduce the size of H*W
  • the input image is transformed into a feature map of (H/8)*(W/8).
  • the decoding part may include two branches, including a point of interest decoder and a descriptor decoder, which are respectively used to extract two-dimensional feature point information and corresponding descriptor information.
  • the interest point decoder decodes the encoded feature map, increases the depth, and finally reshapes the depth to the output of the same size as the original input image.
  • the descriptor decoder decodes the encoded feature map, and then performs bicubic interpolation and L2 normalization to obtain the final descriptor information.
  • performing feature point matching on the continuous environmental images according to the image feature information to filter the key frame images may include:
  • S1221 Use the first frame of environment image as the current key frame image, and select one or more frames of the environment image to be matched that are continuous with the current key frame image.
  • S1222 Use the descriptive sub-information to perform feature point matching on the current key frame image and the environment image to be matched, and use the environment image to be matched whose feature point matching result is greater than a preset threshold as the key frame image .
  • S1223 Use the selected key frame image as a current key frame image, and obtain one or more frames of environmental images to be matched that are continuous with the current key frame image.
  • S1224 Perform feature point matching on the current key frame image and the environmental image to be matched by using the descriptive sub-information, so as to continuously filter the key frame image.
  • the frame environmental image collected by the monocular camera can be used as the initial current key frame image, and the subsequent collected frames of environmental images can be filtered using this as a starting point. Since the collection of environmental images is continuous collection of environmental images at a certain frequency, the difference between two or more consecutive environmental images may not be large. Therefore, when the key frame image is filtered, for the current key frame image, one environmental image continuous with the current key frame image, or 2, 3, or 5 continuous environmental images can be selected as the environmental images to be matched.
  • any key frame description sub-information in the current key frame image can be selected, which corresponds to the environment image to be matched.
  • the Euclidean distance is calculated for each descriptor information, and the descriptor with the smallest Euclidean distance is selected as the descriptor information matching the key frame descriptor information, so that the matching feature points in the current key frame image can be determined to be matched Feature points in the environment image, and establish a matching pair of feature points.
  • Traverse each descriptive sub-information in the current key frame image to obtain the matching result of each feature point in the current key frame image.
  • a fixed number of feature points can be selected for the current key frame image for matching. For example, select 150, 180, or 200 feature points. In this way, the tracking failure caused by too few selected feature points is avoided; or the calculation efficiency is affected due to the too many selected feature points.
  • a certain number of feature points can also be selected according to the objects contained in the current key frame image, for example, the feature points of the object highlighted by color, shape or mechanism can be selected.
  • the KNN k-Nearest Neighbor, nearest neighbor classification
  • the feature point matching result is greater than the preset threshold, it can be determined that the current key frame image and the image of the environment to be matched are tracked successfully, and the environment to be matched is used as the key frame image. For example, when the accuracy of the feature point matching result is greater than 70% or 75%, it is determined that the tracking is successful, and the environment image to be matched is used as the key frame image.
  • the selected key frame image can be used as the second current key frame image, and the environment image to be matched corresponding to the current key frame image can be selected to continue the key frame image Judgment and screening.
  • the three-dimensional information of each feature point in each key frame image may also be constructed, which may specifically include:
  • S132 Calculate the depth information of the feature point matching pair corresponding to the feature point, so as to construct the three-dimensional feature information of the key frame image by using the depth information and the feature point information of the feature point.
  • a feature point matching pair when performing feature point matching, for two adjacent key frame images that match each other, a feature point matching pair can be constructed for the feature points that match each other, and the feature point matching pair can be used to perform motion estimation.
  • the triangulation method can be used to calculate the depth information of the feature points.
  • the bag of words model may also be used to extract the image feature information of the key frame images.
  • the foregoing method may further include:
  • S13-2 Perform feature extraction on the key frame image by using the trained bag-of-words model to obtain image feature information.
  • the bag-of-words model can be trained in advance.
  • the bag-of-words model can extract features from the training pictures. For example, if the number of extracted feature types is w, each feature can be called a word; the trained bag-of-words model can include w words.
  • each word When it is necessary to extract the feature information of the bag of words model of a key frame, each word will score the key frame, and the score value is a floating point number of 0 to 1, so that each key frame can be represented by a w-dimensional floating point number ,
  • This w-dimensional vector is the feature vector of the bag-of-words model, and the scoring formula can include:
  • N is the number of training pictures
  • n i is the number of times the word w i appears in the training picture
  • I t is the image I collected at time t
  • n iIt is the number of times the word w i appears in the image I t
  • n It is the image The total number of words that appear in I t.
  • the bag-of-words model feature information of each key frame is a w-dimensional floating-point number vector.
  • the training process of the above-mentioned bag-of-words model can generally include the following steps: feature detection of training images, that is, extract visual vocabulary from various training images, and group all visual vocabulary together; feature representation, that is, use K- The Means algorithm constructs the word list; the generation of the word book is to use the middle vocabulary of the word list to represent the image.
  • the training process of the visual bag-of-words model can be realized by using a conventional method, which will not be repeated in this disclosure.
  • descriptor information and three-dimensional feature information can be serialized and stored locally. Generate offline map data.
  • the image feature information of the key frame image can also be stored together.
  • the map construction method of the embodiment of the present application filters the key frame images by using the feature point information of the environmental image and the corresponding descriptor information, and uses the feature point information of the key frame image, the descriptor information, and the bag-of-words model.
  • the extracted image feature information is used to construct map data together, and the image features based on deep learning are used.
  • the constructed map data has the advantage of strong anti-noise ability.
  • by using a variety of graphic features to construct map data it can still be effective in a variety of scenes such as environment and light and dark changes, which greatly improves the positioning accuracy and robustness of positioning.
  • the 2D and 3D information of the visual key frame can be considered at the same time, and the position and posture data can be provided at the same time when positioning, compared with other
  • the indoor positioning method improves the degree of freedom of positioning.
  • Fig. 3 shows a schematic diagram of a positioning method according to an embodiment of the present application. As shown in Figure 3, the method includes some or all of the following:
  • the monocular camera mounted on the terminal device can be activated to collect the current target image in the RGB format.
  • the built offline map is loaded on the terminal device. For example, an offline map constructed as in the above-mentioned embodiment method.
  • the trained SuperPoint-based feature extraction model in the foregoing embodiment may be used to extract the image feature information of the target image.
  • it can include:
  • S43 Input the feature code into a descriptor encoder to obtain the descriptor information corresponding to the environment image.
  • the descriptive sub-information can be used to perform feature point matching with each key frame image in the map data, and when the feature point matching result is greater than a preset threshold, it is determined to be a match If successful, the corresponding key frame image is used as the matching frame image.
  • the pose of the terminal device may also be determined, and the pose information and other characteristic information may be used to generate the current positioning result.
  • determining the pose of the terminal device may include the following steps:
  • S61 Perform feature point matching on the target image and the matching frame image based on the image feature information to obtain target matching feature points.
  • S62 Input the three-dimensional feature information of the matched frame image and the target matching feature points into the trained PnP model to obtain posture information of the target image.
  • the pose information of the matched frame image X3 may be used as the pose information of the target image.
  • the pose estimation model can be used to calculate the pose information of the target image.
  • the PnP (Perspective-n-Point) model can be used to call the Solve PnP function in OpenCV to find the current pose of the target image XC in the map coordinate system.
  • the input parameters of the PnP model are the 3D points in the matching frame image (that is, the feature points in the key frame image in the map coordinate system) and the target obtained by matching the projection of these 3D points in the current frame target image Matching feature points (ie feature points in the target image of the current frame), and its output is the pose transformation of the target image of the current frame relative to the origin of the map coordinate system (ie the pose of the target image of the current frame in the map coordinate system).
  • the calculation principle may include the following content: Referring to FIG. 6, suppose the center of the current coordinate system is point o, and A, B, and C are three 3D feature points. According to the law of cosines, there are the following formula:
  • w,v,cos ⁇ a,c>,cos ⁇ b,c>,cos ⁇ a,b> are known quantities
  • x and y are unknown quantities.
  • the values of x and y can be obtained by the above two equations , And then the values of OA, OB, OC can be solved, which can be obtained according to the following formula:
  • the camera pose can be solved by transforming the map coordinate system to the current coordinate system.
  • the 3D coordinates of the corresponding 2D points in the current coordinate system of the target image are calculated, and then the camera pose is calculated according to the 3D coordinates of the matching frame image in the map coordinate system and the 3D coordinates in the current coordinate system in the map data.
  • the bag-of-words model may also be used to perform feature matching on the target image. Specifically, referring to FIG. 4, the following steps may be included:
  • S32-22 Use the image feature information of the target image to match each key frame image in the map data to determine a matching frame image.
  • the bag-of-words model can also be used for the target image to extract the image feature information, and use Image feature information is matched.
  • it can include:
  • S51 Calculate the similarity between the target image and each key frame image in the map based on the image feature information, and screen the key frame images with the similarity greater than a threshold to obtain a matching frame set.
  • S52 Group each key frame image in the matching frame set according to the time stamp information and the similarity value of the key frame image to obtain at least one image group.
  • S53 Calculate the matching degree between the image group and the target image, and select the image group with the highest matching degree as the matching image group.
  • S54 Select the key frame image with the highest similarity between the target image and the matched image group as the image to be matched, and compare the similarity corresponding to the image to be matched with a second threshold.
  • the image feature information may be used to query the similarity between the current target image and each key frame in the map data, and the key frame images whose similarity is greater than or lower are generated to generate a matching frame set.
  • the similarity calculation formula can include:
  • v 1 is the image feature vector provided by the bag of words model of the target image
  • v 2 is the feature vector of the bag of words model of a certain key frame image in the map data.
  • the key frame images can also be grouped according to the timestamp of each key frame image and the similarity value calculated in the above steps.
  • the time stamp may be within the range of a fixed threshold TH1, for example, the time difference between the key frame image and the last key frame image in the image group is within 1.5s.
  • the similarity value between a key frame image and the last key frame image in the image group may be within a threshold TH2 range, for example, 60%-70%.
  • the calculation formula can include:
  • s(v t , v tj ) is the similarity value between the target image and the key frame image in the image group; s(v t , v t- ⁇ t ) is the target image and the last key frame image in the image group The similarity value of; ⁇ t is TH 1 , and ⁇ is smaller than TH 2 .
  • the matching between the target image and each image group can also be calculated, and only the image group with the highest matching degree is retained.
  • the calculation formula of the matching degree can include:
  • v t is the image feature vector extracted by the bag-of-words model of the target image
  • V ti is the image feature vector extracted by the bag-of-words model of a key frame image in the image group.
  • the image group with the highest matching degree value can be selected as the matching image group.
  • the key frame image with the highest similarity value to the target image calculated in the previous step is selected as the image to be matched.
  • the aforementioned similarity value corresponding to the image to be matched is compared with the preset threshold TH 3. If the similarity score is higher than the threshold TH 3 , the matching is successful and the matching frame image is output; otherwise, the matching fails.
  • the matching speed of the target image and the key frame image in the map data can be effectively improved.
  • the positioning method of the embodiment of the present application uses the bag-of-words model for image matching, and then accurately calculates the current position and posture through the PnP model.
  • the combination of the two forms a set of low-cost, high-precision, and robust
  • the environment perception method is suitable for a variety of complex scenarios and meets the needs of productization.
  • the positioning process takes into account both the 2D and 3D information of the visual key frame, and the position and posture can be provided in the positioning result at the same time, which improves the degree of freedom of positioning compared with other indoor positioning methods. It can be directly applied to mobile terminal equipment, and the positioning process does not need to introduce other external base station equipment, so the positioning cost is low.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • FIG. 7 shows a schematic block diagram of a positioning system 70 according to an embodiment of the present application.
  • the positioning system 70 includes:
  • the positioning instruction response module 701 is configured to collect a target image in response to a positioning instruction.
  • the image feature recognition module 702 is configured to extract image feature information of the target image; wherein the image feature information includes feature point information and descriptor information of the target image.
  • the matching frame screening module 703 is configured to match each key frame image in the map data according to the feature information of the image to determine a matching frame image.
  • the positioning result generating module 704 is configured to generate the current positioning result corresponding to the target image according to the matched frame image.
  • the positioning system 70 further includes:
  • the image feature information matching module is used to perform feature extraction on the target image using the trained bag-of-words model to obtain image feature information; and use the image feature information of the target image to match each key frame image in the map data to Determine the matching frame image.
  • the image feature information matching module may include:
  • the matching frame set screening unit is used to calculate the similarity between the target image and each key frame image in the map based on the image feature information, and to screen the key frame images with the similarity greater than a threshold to obtain a matching frame set .
  • the image group screening unit is configured to group each key frame image in the matching frame set according to the time stamp information and the similarity value of the key frame image to obtain at least one image group.
  • the matching image group screening unit is used to calculate the degree of matching between the image group and the target image, and to select the image group with the highest degree of matching as the matching image group.
  • the similarity comparison unit is configured to select the key frame image with the highest similarity between the target image and the matched image group as the image to be matched, and compare the similarity corresponding to the image to be matched with a second threshold comparing.
  • the matching frame image determination unit is configured to determine the to-be-matched image as the matching frame image corresponding to the target image when the similarity corresponding to the to-be-matched image is greater than the second threshold; or, in the to-be-matched image; When the similarity corresponding to the matching image is less than the second threshold, it is determined that the matching fails.
  • the positioning system 70 further includes:
  • a posture information acquisition module configured to perform feature point matching between the target image and the matching frame image based on image feature information to obtain target matching feature points; and matching the three-dimensional feature information of the matching frame image with the target The feature points are input into the trained PnP model to obtain the posture information of the target image.
  • the image feature recognition module 802 may be used to obtain the image feature information of the environment image by using a trained superpoint-based feature extraction model. It can include:
  • the feature encoding unit is used to encode the target image by an encoder to obtain feature encoding.
  • the point of interest encoding unit is configured to input the feature code into an interest point encoder to obtain the feature point information corresponding to the target image.
  • the descriptor encoding unit is configured to input the feature encoding into the descriptor encoder to obtain the descriptor information corresponding to the target image.
  • the positioning result generating module 704 may also be configured to combine the matching frame image with the pose information of the target image to generate the current positioning result.
  • the positioning system of the embodiment of the present application can be applied to smart mobile terminal devices equipped with cameras, such as mobile phones and tablet computers. And it can be directly applied to mobile terminal equipment, and the positioning process does not need to introduce other external base station equipment, so positioning cost is low. In addition, there is no need to introduce algorithms with higher error rates such as object recognition in the positioning process, and the positioning success rate is high, and the robustness is strong.
  • each unit, module, and other operations and/or functions in the positioning system 70 are used to implement the corresponding process in the method of FIG. 3 or FIG. 4, and are not repeated here for brevity.
  • FIG. 8 shows a schematic block diagram of a map construction device 80 according to an embodiment of the present application.
  • the map building device 80 includes:
  • the environmental image acquisition module 801 is used to acquire environmental images of the current environment.
  • the image feature information recognition module 802 is configured to obtain image feature information of the environmental image, and perform feature point matching on the continuous environmental images according to the image feature information to filter key frame images; wherein, the image feature information includes Feature point information and corresponding descriptor information.
  • the three-dimensional feature information generating module 803 is configured to obtain depth information corresponding to the matched feature points in the key frame image to construct the three-dimensional feature information of the key frame image.
  • the map construction module 804 is configured to construct map data of the current environment based on the key frame image; wherein the map data includes image feature information and three-dimensional feature information corresponding to the key frame image.
  • the map construction device 80 further includes:
  • the image feature information acquisition module is used to extract features of the key frame image using the trained bag-of-words model to obtain image feature information, so as to use the image feature information, three-dimensional feature information, and image feature information corresponding to the key frame image Construct the map data.
  • the environmental image acquisition module may include:
  • the acquisition execution unit is configured to continuously acquire environmental images corresponding to the current environment at a preset frequency by using a monocular camera.
  • the image feature information recognition module may be used to obtain the image feature information of the environmental image by using a trained superpoint-based feature extraction model, including:
  • the encoder processing unit is configured to encode the environment image by using the encoder to obtain the characteristic encoding.
  • the point of interest encoder processing unit is configured to input the feature code into the point of interest encoder to obtain the feature point information corresponding to the environment image.
  • the descriptor encoder processing unit is configured to input the feature code into the descriptor encoder to obtain the descriptor information corresponding to the environment image.
  • the map construction device 80 further includes:
  • the feature extraction model training module is used to construct a synthetic database, and use the synthetic database to train the element feature point extraction model; perform random homography transformation on each original image in the MS-COCO data set to obtain the transformed image corresponding to each original image.
  • the trained magic point model performs feature extraction on the transformed image to obtain the true value of the feature points of each original image in the MS-COCO data set; each original image in the MS-COCO data set and the feature point label corresponding to each original image are used as training samples Data, train the super-point model to obtain a feature extraction model based on the super-point.
  • the image feature information recognition module may include:
  • the to-be-matched environment image selection unit is configured to use the first frame of the environment image as the current key frame image, and select one or more frames of the to-be-matched environment image that are continuous with the current key frame image.
  • the feature point matching unit is configured to use the descriptor information to perform feature point matching between the current key frame image and the environment image to be matched, and use the environment image to be matched whose feature point matching result is greater than a preset threshold as The key frame image.
  • the loop unit is configured to use the selected key frame image as the current key frame image, and obtain one or more frames of environmental images to be matched that are continuous with the current key frame image; and use the descriptor information for the current key frame image Feature point matching is performed between the frame image and the environmental image to be matched to continuously filter the key frame images.
  • the three-dimensional feature information generating module may include:
  • the feature point matching pair establishing unit is configured to use the current key frame image and the feature points that match each other in the key frame image that matches the current key frame image to establish a feature point matching pair.
  • the depth information calculation unit is configured to calculate the depth information of the feature point matching to the corresponding feature point, so as to construct the three-dimensional feature information of the key frame image by using the depth information and the feature point information of the feature point.
  • the loop unit is used to select a fixed number of feature points for the current key frame image for matching.
  • the loop unit is configured to select a preset number of feature points for matching according to objects contained in the current key frame image.
  • the loop unit is configured to filter the feature point matching results to clear the feature point matching results after obtaining the feature point matching results between the current key frame image and the to-be-matched environment image. Wrong matching result.
  • the map construction module is used to serialize and store key frame images and corresponding feature point information, descriptor information, and three-dimensional feature information to generate offline map data.
  • FIG. 9 shows a computer system 900 for implementing a wireless communication terminal according to an embodiment of the present invention
  • the wireless communication terminal may be a smart mobile terminal such as a mobile phone or a tablet computer equipped with a camera.
  • the computer system 900 includes a central processing unit (Central Processing Unit, CPU) 901, which can be loaded into a random storage unit according to a program stored in a read-only memory (Read-Only Memory, ROM) 902 or from a storage part 908. Access to the program in the memory (Random Access Memory, RAM) 903 to execute various appropriate actions and processing. In RAM 903, various programs and data required for system operation are also stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (Input/Output, I/O) interface 905 is also connected to the bus 904.
  • the following components are connected to the I/O interface 905: the input part 906 including a keyboard, a mouse, etc.; including an output part 907 such as a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (LCD), and speakers, etc. ; A storage part 908 including a hard disk, etc.; and a communication part 909 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 909 performs communication processing via a network such as the Internet.
  • the drive 910 is also connected to the I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 910 as needed, so that the computer program read from it is installed into the storage part 908 as needed.
  • an embodiment of the present invention includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 909, and/or installed from the removable medium 911.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiment of the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present invention may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the computer system 900 of the embodiment of the present application can accurately locate the target scene and display it in time; and can effectively deepen the user's sense of immersion and improve the user experience.
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above-mentioned embodiments; it may also exist alone instead of Assemble into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments. For example, the electronic device can implement the steps shown in FIG. 1, FIG. 2, FIG. 3, or FIG. 4.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiment described above is only illustrative.
  • the division of the unit is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • this function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种地图构建方法、定位方法及***、无线通信终端和计算机可读介质,地图构建方法包括:采集当前环境的环境图像;获取所述环境图像的图像特征信息,根据图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像;其中,图像特征信息包括特征点信息和对应的描述子信息;获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息;基于所述关键帧图像构建所述当前环境的地图数据;其中,地图数据包括所述关键帧图像对应的图像特征信息和三维特征信息。本申请提供的地图构建方法和定位方法具有定位高精度、鲁棒性强,适用于多种复杂场景的优点。

Description

地图构建方法、定位方法及***、无线通信终端、计算机可读介质
本申请要求在2019年10月31日提交的申请号为201911056898.3、发明名称为“地图构建方法及装置、定位方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及地图构建及定位技术领域,并且更具体地,涉及一种地图构建方法、一种定位方法、一种定位***、一种无线通信终端和一种计算机可读介质。
背景技术
随着计算机技术的不断发展,定位及导航在不同的领域、多种场景得到了广泛的应用,例如针对室内环境或室外环境的定位和导航。现有技术中,可以利用视觉信息建立环境地图,协助用户感知周边环境,并快速定位到自身位置。
现有技术在构建地图以及定位的过程中也存在一定的不足。例如,现有技术方案在对环境建图和图像匹配的过程中,只考虑了传统图像特征,而传统图像特征的抗噪能力差,导致定位成功率低。另外,在定位过程中,若环境中光线发生明暗变换,或者由于季节变换等因素导致环境变化的情况下,也无法完成准确的定位。此外,现有技术在定位过程中大多只考虑到视觉图像的二维特征,定位自由度存在欠缺;并且,还存在定位鲁棒性差的问题。
发明内容
有鉴于此,本申请实施例提供了一种地图构建方法、定位方法即***、无线通信终端和计算机可读介质,能够基于所采集环境图像的图像特征进行地图构建和定位。
第一方面,提供了一种地图构建方法,该方法包括:采集当前环境的环境图像;获取所述环境图像的图像特征信息,根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像;其中,所述图像特征信息包括特征点信息和对应的描述子信息;获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息;基于所述关键帧图像构建所述当前环境的地图数据;其中,所述地图数据包括所述关键帧图像对应的图像特征信息和三维特征信息。
第二方面,提供了一种定位方法,该方法包括:响应于一定位指令,采集目标图像;提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息;根据所述图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像;根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
第三方面,提供了一种定位***,该定位***包括:定位指令响应模块,用于响应于一定位指令,采集目标图像;图像特征识别模块,用于提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息;匹配帧筛选模块,用于根据所述图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像;定位结果生成模块,用于根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
第四方面,提供了一种计算机可读介质,用于储存为执行上述第一方面或第二方面中的方法所用的计算机软件指令,其包含用于执行上述各方面所设计的程序。
第五方面,提供了一种无线通信终端,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行上述第一方面或第二方面中的方法。
本申请中,无线通信终端以及定位***等的名字对设备本身不构成限定,在实际实 现中,这些设备可以以其他名称出现。只要各个设备的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
图1示出了本申请实施例的地图构建方法的示意图。
图2示出了本申请实施例的另一种地图构建方法的流程示意图。
图3示出了本申请实施例的定位方法的流程示意图。
图4示出了本申请实施例的另一种定位方法的流程示意图。
图5示出了本申请实施例的关键帧图像匹配结果示意图。
图6示出了本申请实施例的PnP模型求解原理示意图。
图7示出了本申请实施例的定位***的示意性框图。
图8示出了本申请实施例的地图构建装置的示意性框图。
图9示出了本申请实施例的无线通信终端的计算机***的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在相关技术中,通过采集视觉图像来构建环境地图时,现有方案只考虑传统的图像特征,而传统图像特征的抗噪能力差,定位成功率低。并且,在构建的地图中,若发生光线明暗变化或者季节变换导致环境特征发生改变,则可能导致无法进行定位。另外,现有方案在构建地图时大多只利用视觉图像的二维特征信息,定位自由度存在欠缺,定位鲁棒性较差。这样,就需要一种方法,解决上述的现有技术存在的缺点和不足。
图1示出了本申请实施例的一种地图构建方法的示意性。如图1所示,该方法包括以下部分或全部内容:
S11,采集当前环境的环境图像;
S12,获取所述环境图像的图像特征信息,根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像;其中,所述图像特征信息包括特征点信息和对应的描述子信息;
S13,获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息;
S14,基于所述关键帧图像构建所述当前环境的地图数据;其中,所述地图数据包括所述关键帧图像对应的图像特征信息和三维特征信息。
具体地,在对室内或室外环境构建地图时,可以利用单目摄像头按一定的频率连续采集环境图像。所采集的环境图像可以是RGB环境图像。举例来说,在当前环境中,可以控制单目摄像头按照10-20帧/秒的频率采集环境图像,并按一定的速度移动,从而获取当前环境对应的全部环境图像。
可选地,在本申请实施例中,在获取环境图像后,可以利用已训练的基于超点(Super Point)的特征提取模型实时的对环境图像进行特征提取,从而获取各环境图像的图像特征信息。其中,图像特征信息包括特征点信息和对应的描述子信息。
具体的,利用已训练的基于Super Point的特征提取模型获取所述环境图像的图像特征信息可以包括:
S1211,利用编码器对所述环境图像进行编码以获取的特征编码;
S1212,将所述特征编码输入感兴趣点编码器,以获取所述环境图像对应的所述特征点信息;以及
S1213,将所述特征编码输入描述子编码器,以获取所述环境图像对应的所述描述子 信息。
具体的,基于Super Point的特征提取模型可以包括编码部分,以及解码部分。其中,对于编码部分来说,输入图像可以是全尺寸图像,利用编码器对输入图像进行降维,得到降维后的特征图,即特征编码。对于解码部分来说,可以包括感兴趣点编码器和描述子编码器。利用感兴趣点编码器对特征编码进行解码,输出与环境图像大小相同的特征点信息;以及利用描述子编码器对特征编码进行解码,输出特征点信息对应的描述子信息,利用描述子信息描述对应特征点的图像特征,例如颜色、轮廓等信息。
可选地,在本申请实施例中,可以预先对基于Super Point的特征提取模型进行训练,训练过程可以包括以下步骤:
S21,构建合成数据库,并利用合成数据库训练元素特征点提取模型。
S22,对MS-COCO数据集中各原始图像进行随机的单应性变换以获取各原始图像对应的变换图像,利用已训练的Magic Point模型对变换图像进行特征提取,以获取MS-COCO数据集中各原始图像的特征点真值。
S23,以MS-COCO数据集中各原始图像以及各原始图像对应的特征点标签为训练样本数据,训练Super Point模型。
具体的,可以构建包含多种合成形状的合成数据库;合成形状可以包括简单的二维图形,例如四边形、三角形、线以及椭圆等。并定义各二维图像的关键点位置为各二维图形的Y结、L结、T结以及椭圆的中心和线段分割处。该些关键点可以看作是真实世界中感兴趣点的子集。以上述的合成数据库作为训练样本,训练魔点(Magic Point)模型。利用该Magic Point模型提取基本形状元素特征点。
具体的,可以对MS-COCO(Microsoft-Common Objects in Context,微软-上下文中的公共对象)数据集中各原始图像进行n种随机的单应性变换以获取n个对应的变换图像。上述的单应性变换是指一个从一张图像到另一张图像映射关系的转换矩阵,两张图片中相同颜色的点叫做相关点。以MS-COCO数据集中各原始图像作为输入图像,使用随机的转换矩阵对输入图像进行变换得到变换图像。举例来说,随机的单应变换可以是由简单的变换复合而成的变换方式。
利用已训练的Magic Point模型对各变换图像进行特征点提取以获取各原始图像对应的n个特征点热图。将该n个特征点热图进行累加,结合同源图像的各输出图像,得到最终的特征点累加图。利用预设阈值对特征点累加图中各位置上的特征点进行筛选,筛选出变化较强烈的特征点,利用筛选后的特征点来描述原始图像中的形状。并将筛选后的特征点作为各特征点的真值作为特征点标签,作为后续Super Point模型训练的样本数据。
具体的,对于Super Point模型来说,输入参数可以是全尺寸的图像,使用一个编码器对输入图像进行降维,利用两个解码器分别输出特征点信息和描述子信息。举例来说,编码器可以采用典型的VGG结构,利用连续的三层最大池化层对输入图像进行连续的池化操作,以及一卷积层进行一次卷积操作,将尺寸为H*W的输入图像变换为(H/8)*(W/8)的特征图。解码部分可以包括两个分支,包括感兴趣点解码器和描述子解码器,分别用来提取二维的特征点信息和对应的描述子信息。其中,感兴趣点解码器对编码得到的特征图进行解码,通过增加深度,最后将深度重塑到与原输入图像相同尺寸的输出。描述子解码器对编码得到的特征图进行解码,再经过双三次插值和L2归一化,得到最终的描述子信息。
以MS-COCO数据集中各原始图像以及各原始图像对应的特征点标签为训练样本数据,利用上述的方法完成对Super Point模型的训练。
可选地,在本申请实施例中,上述的根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像可以包括:
S1221,以首帧环境图像作为当前关键帧图像,并选取与所述当前关键帧图像连续的 一帧或多帧待匹配环境图像。
S1222,利用所述描述子信息对所述当前关键帧图像与所述待匹配环境图像进行特征点匹配,并将特征点匹配结果大于预设阈值的所述待匹配环境图像作为所述关键帧图像。
S1223,将筛选的所述关键帧图像作为当前关键帧图像,并获取与该当前关键帧图像连续的一帧或多帧待匹配环境图像。
S1224,利用所述描述子信息对该当前关键帧图像与待匹配环境图像进行特征点匹配,以连续筛选所述关键帧图像。
具体的,在对环境图像进行关键帧筛选时,可以将单目摄像机采集的帧环境图像作为初始的当前关键帧图像,以此为起点对后续采集的各帧环境图像进行筛选。由于环境图像的采集为按照一定的频率连续采集环境图像,连续的两张或多张环境图像的差别可能并不大。因此,在进行关键帧图像筛选时,对于当前关键帧图像来说,可以选取与当前关键帧图像连续的一张环境图像,或者连续的2、3或5张环境图像作为待匹配环境图像。
具体的,对于当前关键帧图像和待匹配环境图像而言,在进行特征点匹配时,举例来说,可以选取当前关键帧图像中的任意一关键帧描述子信息,与待匹配环境图像对应的各描述子信息分别计算欧式距离,并选取欧氏距离最小的描述子信息作为与该关键帧描述子信息匹配的描述子信息,从而可以确定与当前关键帧图像中的特征点相匹配的待匹配环境图像中的特征点,并建立特征点匹配对。遍历当前关键帧图像中的各描述子信息,从而获取当前关键帧图像中各特征点的匹配结果。
举例来说,可以对当前关键帧图像选取固定数量的特征点进行匹配。例如,选取150、180或200个特征点。从而避免选取的特征点过少导致的跟踪失败;或者由于选取的特征点过多而影响计算效率。或者,也可以根据当前关键帧图像中所包含的对象来选取一定数量的特征点,例如选取颜色、形状或机构突出的对象的特征点等等。
此外,在获取当前关键帧图像与待匹配环境图像之间的特征点匹配结果后,还可以利用KNN(k-Nearest Neighbor,最近邻分类)模型对特征点匹配结果进行筛选,从而清除错误的匹配结果。
在特征点匹配结果大于预设阈值时,便可判断为当前关键帧图像与该待匹配环境图像跟踪成功,将该待匹配环境作为关键帧图像。例如,在特征点匹配结果的准确率大于70%或者75%时,便判断为跟踪成功,将待匹配环境图像作为关键帧图像。
具体的,在对首帧关键帧图像跟踪成功后,便可以将选中的关键帧图像作为第二张当前关键帧图像,并选取该当前关键帧图像对应的待匹配环境图像,继续进行关键帧图像的判断和筛选。
可选地,在本申请实施例中,在上述的S13中,还可以构建各关键帧图像中各特征点的三维信息,具体可以包括:
S131,利用所述当前关键帧图像,以及与所述当前关键帧图像相匹配的所述关键帧图像中相互匹配的特征点建立特征点匹配对;
S132,计算所述特征点匹配对对应特征点的深度信息,以利用所述特征点的深度信息和特征点信息构建所述关键帧图像的三维特征信息。
具体的,在进行特征点匹配时,对于相互匹配的相邻的两帧关键帧图像中,可以对相互匹配的特征点构建特征点匹配对,并利用特征点匹配对进行运动估计。对于特征点匹配对对应的特征点,可以利用三角化方法计算特征点的深度信息。
作为一个可替代的实施例,在本申请实施例中,在筛选关键帧图像后,还可以利用词袋模型对关键帧图像进行图像特征信息提取。具体的,参考图2所示,在S13之后,上述方法还可以包括:
S13-2,利用已训练的词袋模型对所述关键帧图像进行特征提取以获取图像特征信息。
具体的,可以预先训练词袋模型。词袋模型在训练时,可以从训练图片中进行特征提取,例如提取的特征种类数量为w,则每一种特征可以被称为一个单词;已训练的词袋模型可以包括w个单词。
当需要提取一个关键帧的词袋模型特征信息时,每个单词会对该关键帧进行评分,评分值为0~1的浮点数,这样每个关键帧都可以用w维的浮点数来表示,这个w维向量就是词袋模型特征向量,评分公式可以包括:
Figure PCTCN2020124547-appb-000001
Figure PCTCN2020124547-appb-000002
其中,N为训练图片数量,n i为单词w i在训练图片中出现的次数,I t为t时刻采集的图像I,n iIt为单词w i出现图像I t里的次数,n It为图像I t里出现的单词总数。通过单词评分,每个关键帧的词袋模型特征信息就是一个w维的浮点数向量。
具体的,上述的词袋模型的训练过程一般可以包括以下步骤:对训练图像进行特征检测,即对各类训练图像提取视觉词汇,将所有的视觉词汇集合在一起;特征表示,即利用K-Means算法构造单词表;单词本的生成,即利用单词表的中词汇表示图像。视觉词袋模型的训练过程采用常规方法即可实现,本公开对此不再赘述。
具体的,上述的方法的S14中,在提取各关键帧图像的各项特征信息后,便可以将关键帧图像及对应的特征点信息、描述子信息和三维特征信息进行序列化存储到本地,生成离线形式的地图数据。
此外,还可以对关键帧图像的图像特征信息一并进行存储。
因此,本申请实施例的地图构建方法,通过利用环境图像的特征点信息和对应的描述子信息对关键帧图像进行筛选,并利用关键帧图像的特征点信息、描述子信息、基于词袋模型提取的图像特征信息一并构建地图数据,使用了基于深度学习的图像特征,构建的地图数据具有抗噪能力强的优点。并且,通过利用多种图形特征构建地图数据,在环境、光线明暗变换等多种场景下依然能够生效,大幅提高了定位精度和定位的鲁棒性。另外,通过在构建地图数据时保存了关键帧图像中各特征点的三维特征信息,实现了同时考虑视觉关键帧的2D和3D信息,在定位时上可以同时提供位置和姿态数据,相对于其他室内定位方法提高了定位自由度。
图3示出了本申请实施例的一种定位方法的示意性。如图3所示,该方法包括以下部分或全部内容:
S31,响应于一定位指令,采集目标图像。
S32-11,提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息;
S32-12,根据所述图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像;
S33,根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
具体的,在用户定位时,可以激活终端设备搭载的单目摄像头,采集当前的RGB格式的目标图像。同时在终端设备加载已构建的离线地图。例如,如上述实施例方法所构建的离线地图。
可选地,在本申请实施例中,对于目标图像,可以利用上述实施例中的已训练的基于Super Point的特征提取模型提取目标图像的图像特征信息。具体来说,可以包括:
S41,利用编码器对所述环境图像进行编码以获取的特征编码;
S42,将所述特征编码输入感兴趣点编码器,以获取所述环境图像对应的所述特征点信息;以及
S43,将所述特征编码输入描述子编码器,以获取所述环境图像对应的所述描述子信 息。
具体的,在提取目标图像的特征点信息以及描述子信息后,可以利用描述子信息与地图数据中各关键帧图像进行特征点匹配,并在特征点匹配结果大于预设阈值时,判定为匹配成功,将对应的关键帧图像作为匹配帧图像。
可选地,在本申请实施例中,在确定与当前目标图像相匹配的匹配帧图像后,还可以确定终端设备的位姿,再将位姿信息和其他特征信息生成当前定位结果。
具体来说,确定终端设备的位姿可以包括以下步骤:
S61,基于图像特征信息对所述目标图像与所述匹配帧图像进行特征点匹配,以获取目标匹配特征点。
S62,将所述匹配帧图像的三维特征信息和所述目标匹配特征点输入已训练的PnP模型中,以获取所述目标图像的姿态信息。
具体的,参考图5所示场景,在利用上述步骤对目标图像提取特征点信息以及描述子信息后,可以对当前帧目标图像X C的第N个特征点F CN,遍历匹配帧图像X 3的所有特征点,计算特征点描述子之间的欧式距离;并选取欧式距离最小一组进行阈值判断,若小于预设阈值,则形成一组特征点匹配对,否则不形成匹配对。再另N=N+1,遍历当前帧目标图像X C的所有特征点,获取匹配对序列{F 1,F 2,F 3}作为目标匹配特征点。
具体的,在获取匹配对序列{F1,F2,F3}后,若匹配对序列的元素数量小于预设阈值,则可以将匹配帧图像X3的位姿信息作为目标图像的位姿信息。
或者,若匹配对序列的元素数量大于预设阈值,则可以利用姿态估计模型来计算目标图像的位姿信息。例如,可以采用PnP(Perspective-n-Point)模型,调用OpenCV中的Solve PnP函数求解出目标图像XC在地图坐标系下的当前位姿。
具体来说,PnP模型的输入参数为匹配帧图像中的3D点(即地图坐标系下,该关键帧图像中的特征点)和该些3D点在当前帧目标图像中的投影匹配得到的目标匹配特征点(即当前帧目标图像中的特征点),其输出为当前帧目标图像相对于地图坐标系原点的位姿变换(即当前帧目标图像在地图坐标系中的位姿)。
举例来说,其计算原理可以包括以下内容:参考图6所示,设当前坐标系中心为点o,A、B、C为三个3D特征点。根据余弦定理有如下公式:
OA 2+OB 2-2·OA·OB·cos<a,b>=AB 2
OA 2+OC 2-2·OA·OC·cos<a,c>=AC 2
OB 2+OC 2-2·OB·OC·cos<b,c>=BC 2
对上式进行消元,同时除以OC 2,并令
Figure PCTCN2020124547-appb-000003
则可得到:
Figure PCTCN2020124547-appb-000004
Figure PCTCN2020124547-appb-000005
Figure PCTCN2020124547-appb-000006
接着进行替换,令
Figure PCTCN2020124547-appb-000007
则可得:
x 2+y 2-2·x·y·cos<a,b>=u 2
x 2+1-2·x·cos<a,c>=wu
y 2+1-2·y·cos<b,c>=vu
将上述的3个公式进行运算,则可得:
(1-w)x 2-w·y 2-2·x·cos<a,c>+2·w·x·y·cos<a,b>+1=0
(1-v)y 2-v·x 2-2·y·cos<b,c>+2·v·x·y·cos<a,b>+1=0
其中,w,v,cos<a,c>,cos<b,c>,cos<a,b>为已知量,x、y为未知量,通过以上两方程可以求得x、y的值,继而可以求解OA、OB、OC的值,可以根据如下公式求得:
Figure PCTCN2020124547-appb-000008
最后求解三个特征点在当前坐标系下的坐标,根据向量公式,可得:
在获取A、B、C三个特征点在当前坐标系下的坐标后,可以通过地图坐标系到当前坐标系的变换求解相机位姿。
以上通过求出对应的2D点在目标图像所在当前坐标系下的3D坐标,然后根据地图数据中地图坐标系下的匹配帧图像的3D坐标和当前坐标系下的3D坐标求解相机位姿。
作为一个可替代的实施例,在本申请实施例中,还可以利用词袋模型对目标图像进行特征匹配,具体来说,参考图4所示,可以包括以下步骤:
S31,响应于一定位指令,采集目标图像。
S32-21,利用已训练的词袋模型对所述目标图像进行特征提取以获取图像特征信息;
S32-22,利用所述目标图像的图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像。
S33,根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
可选地,在本申请实施例中,由于地图数据中存储了各关键帧图像的利用词袋模型提取的图像特征信息,因此,可以对目标图像同样利用词袋模型提取图像特征信息,并利用图像特征信息进行匹配。具体来说,可以包括:
S51,基于所述图像特征信息计算所述目标图像和所述地图中各关键帧图像之间的相似度,并筛选相似度大于阈值的关键帧图像,以获取匹配帧集合。
S52,根据所述关键帧图像的时间戳信息以及相似度数值对所述匹配帧集合中的各关键帧图像进行分组,以获取至少一个图像组。
S53,计算所述图像组与所述目标图像之间的匹配度,以选取匹配度最高的所述图像组作为匹配图像组。
S54,选取所述匹配图像组中与所述目标图像之间相似度最高的所述关键帧图像作为待匹配图像,并将所述待匹配图像对应的相似度与第二阈值进行对比。
S55,在所述待匹配图像对应的相似度大于所述第二阈值时,将所述待匹配图像判定为所述目标图像对应的匹配帧图像;或者,
在所述待匹配图像对应的相似度小于所述第二阈值时,判定匹配失败。
具体的,可以利用图像特征信息查询当前的目标图像与地图数据中每个关键帧的相似度,并将相似度大于低于的关键帧图像生成匹配帧集合。相似度计算公式可以包括:
Figure PCTCN2020124547-appb-000009
其中,v 1为目标图像的词袋模型提供的图像特征向量;v 2为地图数据中某关键帧图像的词袋模型特征向量。
具体的,在筛选匹配帧集合后,还可以根据各关键帧图像的时间戳,以及上述步骤中计算的相似度数值对关键帧图像进行分组。例如,时间戳可以为固定阈值TH1范围内,例如图像组内个关键帧图像与最后一个关键帧图像之间的时间差在1.5s内。
或者,图像组内个关键帧图像与最后一个关键帧图像的相似度值可以为在一个阈值 TH2范围内,例如为60%-70%。其计算公式可以包括:
Figure PCTCN2020124547-appb-000010
其中,s(v t,v tj)为目标图像与图像组中帧关键帧图像的相似度数值;s(v t,v t-△t)为目标图像与图像组中最后一帧关键帧图像的相似度数值;△t为TH 1,η要小于TH 2
具体的,在划分图像组后,还可以计算目标图像与各图像组之间的匹配,并仅保留匹配度最高的图像组。匹配度计算公式可以包括:
Figure PCTCN2020124547-appb-000011
其中,v t为目标图像的词袋模型提取的图像特征向量;V ti为图像组内某一帧关键帧图像的词袋模型提取的图像特征向量。
具体的,在计算目标图像与各图像组的匹配度后,可以筛选出匹配度数值最高的图像组,作为匹配图像组。并选取该图像组中,前序步骤中计算出的与目标图像的相似度数值最高的关键帧图像,作为待匹配图像。
将上述的待匹配图像对应的相似度数值与预设的阈值TH 3对比,如果相似度分数高于阈值TH 3,则匹配成功,输出匹配帧图像;否则匹配失败。
通过利用词袋模型提取的图像特征信息进行匹配,可以有效的提升目标图像与地图数据中关键帧图像的匹配速度。
因此,本申请实施例的定位方法,采用词袋模型进行图像匹配,再通过PnP模型精确地计算出当前自身位置和姿态,两者结合形成了一套低成本、高精度、强鲁棒性的环境感知方法,适用于多种复杂场景,满足产品化需求。并且,定位过程同时考虑视觉关键帧的2D和3D信息,在定位结果上可以同时提供位置和姿态,相对于其他室内定位方法提高了定位自由度。可以直接应用于移动终端设备,定位过程不需要引入其他外部基站设备,因此定位成本低。另外,定位过程中也不需要引入物体识别等错误率较高的算法,定位成功率高,鲁棒性强。
应理解,本文中术语“***”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
上文中详细描述了根据本申请实施例的定位方法,下面将结合附图,描述根据本申请实施例的定位***,方法实施例所描述的技术特征适用于以下***实施例。
图7示出了本申请实施例的定位***70的示意性框图。如图7所示,该定位***70包括:
定位指令响应模块701,用于响应于一定位指令,采集目标图像。
图像特征识别模块702,用于提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息。
匹配帧筛选模块703,用于根据所述图像特征特征信息与所述地图数据中各关键帧图像进行匹配以确定匹配帧图像。
定位结果生成模块704,用于根据所述匹配帧图像生成所述目标图像对应的当前定位 结果。
可选地,在本申请实施例中,所述定位***70还包括:
图像特征信息匹配模块,用于利用已训练的词袋模型对所述目标图像进行特征提取以获取图像特征信息;以及利用所述目标图像的图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像。
可选地,在本申请实施例中,所述图像特征信息匹配模块可以包括:
匹配帧集合筛选单元,用于基于所述图像特征信息计算所述目标图像和所述地图中各关键帧图像之间的相似度,并筛选相似度大于阈值的关键帧图像,以获取匹配帧集合。
图像组筛选单元,用于根据所述关键帧图像的时间戳信息以及相似度数值对所述匹配帧集合中的各关键帧图像进行分组,以获取至少一个图像组。
匹配图像组筛选单元,用于计算所述图像组与所述目标图像之间的匹配度,以选取匹配度最高的所述图像组作为匹配图像组。
相似度对比单元,用于选取所述匹配图像组中与所述目标图像之间相似度最高的所述关键帧图像作为待匹配图像,并将所述待匹配图像对应的相似度与第二阈值进行对比。
匹配帧图像判断单元,用于在所述待匹配图像对应的相似度大于所述第二阈值时,将所述待匹配图像判定为所述目标图像对应的匹配帧图像;或者,在所述待匹配图像对应的相似度小于所述第二阈值时,判定匹配失败。
可选地,在本申请实施例中,所述定位***70还包括:
姿态信息获取模块,用于基于图像特征信息对所述目标图像与所述匹配帧图像进行特征点匹配,以获取目标匹配特征点;以及将所述匹配帧图像的三维特征信息和所述目标匹配特征点输入已训练的PnP模型中,以获取所述目标图像的姿态信息。
可选地,在本申请实施例中,所述图像特征识别模块802可以用于利用已训练的基于超点的特征提取模型获取所述环境图像的图像特征信息。具体可以包括:
特征编码单元,用于利用编码器对所述目标图像进行编码以获取的特征编码。
感兴趣点编码单元,用于将所述特征编码输入感兴趣点编码器,以获取所述目标图像对应的所述特征点信息。
描述子编码单元,用于将所述特征编码输入描述子编码器,以获取所述目标图像对应的所述描述子信息。
可选地,在本申请实施例中,所述定位结果生成模块704还可以用于将所述匹配帧图像结合所述目标图像的位姿信息生成所述当前定位结果。
因此,本申请实施例的定位***,可以应用于手机、平板电脑等配置有摄像头的智能移动终端设备中。且可以直接应用于移动终端设备,定位过程不需要引入其他外部基站设备,因此定位成本低。另外,定位过程中也不需要引入物体识别等错误率较高的算法,定位成功率高,鲁棒性强。
应理解,根据本申请实施例的定位***70中的各个单元、模块和其它操作和/或功能分别为了实现图3或图4方法中的相应流程,为了简洁,在此不再赘述。
上文中详细描述了根据本申请实施例的地图构建方法,下面将结合附图,描述根据本申请实施例的地图构建装置,方法实施例所描述的技术特征适用于以下装置实施例。
图8示出了本申请实施例的地图构建装置80的示意性框图。如图8所示,该地图构建装置80包括:
环境图像采集模块801,用于采集当前环境的环境图像。
图像特征信息识别模块802,用于获取所述环境图像的图像特征信息,根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像;其中,所述图像特征信息包括特征点信息和对应的描述子信息。
三维特征信息生成模块803,用于获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息。
地图构建模块804,用于基于所述关键帧图像构建所述当前环境的地图数据;其中,所述地图数据包括所述关键帧图像对应的图像特征信息和三维特征信息。
可选地,在本申请实施例中,所述地图构建装置80还包括:
图像特征信息获取模块,用于利用已训练的词袋模型对所述关键帧图像进行特征提取以获取图像特征信息,以利用所述关键帧图像对应的图像特征信息、三维特征信息和图像特征信息构建所述地图数据。
可选地,在本申请实施例中,所述环境图像采集模块可以包括:
采集执行单元,用于利用单目摄像头以预设频率连续采集当前环境对应的环境图像。
可选地,在本申请实施例中,所述图像特征信息识别模块可以用于利用已训练的基于超点的特征提取模型获取所述环境图像的图像特征信息,包括:
编码器处理单元,用于利用编码器对所述环境图像进行编码以获取的特征编码。
感兴趣点编码器处理单元,用于将所述特征编码输入感兴趣点编码器,以获取所述环境图像对应的所述特征点信息。
描述子编码器处理单元,用于将所述特征编码输入描述子编码器,以获取所述环境图像对应的所述描述子信息。
可选地,在本申请实施例中,所述地图构建装置80还包括:
特征提取模型训练模块,用于构建合成数据库,并利用合成数据库训练元素特征点提取模型;对MS-COCO数据集中各原始图像进行随机的单应性变换以获取各原始图像对应的变换图像,利用已训练的魔点模型对变换图像进行特征提取,以获取MS-COCO数据集中各原始图像的特征点真值;以MS-COCO数据集中各原始图像以及各原始图像对应的特征点标签为训练样本数据,训练超点模型以获取基于超点的特征提取模型。
可选地,在本申请实施例中,所述图像特征信息识别模块可以包括:
待匹配环境图像选择单元,用于以首帧环境图像作为当前关键帧图像,并选取与所述当前关键帧图像连续的一帧或多帧待匹配环境图像。
特征点匹配单元,用于利用所述描述子信息对所述当前关键帧图像与所述待匹配环境图像进行特征点匹配,并将特征点匹配结果大于预设阈值的所述待匹配环境图像作为所述关键帧图像。
循环单元,用于将筛选的所述关键帧图像作为当前关键帧图像,并获取与该当前关键帧图像连续的一帧或多帧待匹配环境图像;以及利用所述描述子信息对该当前关键帧图像与待匹配环境图像进行特征点匹配,以连续筛选所述关键帧图像。
可选地,在本申请实施例中,所述三维特征信息生成模块可以包括:
特征点匹配对建立单元,用于利用所述当前关键帧图像,以及与所述当前关键帧图像相匹配的所述关键帧图像中相互匹配的特征点建立特征点匹配对。
深度信息计算单元,用于计算所述特征点匹配对对应特征点的深度信息,以利用所述特征点的深度信息和特征点信息构建所述关键帧图像的三维特征信息。
可选地,在本申请实施例中,所述循环单元用于对当前关键帧图像选取固定数量的特征点进行匹配。
可选地,在本申请实施例中,所述循环单元用于根据当前关键帧图像中所包含的对象选取预设数量的特征点进行匹配。
可选地,在本申请实施例中,所述循环单元用于在获取所述当前关键帧图像与所述待匹配环境图像之间的特征点匹配结果后,对特征点匹配结果进行筛选以清除错误的匹配结果。
可选地,在本申请实施例中,所述地图构建模块用于将关键帧图像及对应的特征点信息、描述子信息和三维特征信息进行序列化存储,以生成离线形式的地图数据。
应理解,根据本申请实施例的地图构建装置80中的各个单元的上述和其它操作和/或功能分别为了实现图1方法中的相应流程,为了简洁,在此不再赘述。
图9示出了本申请实施例的实现本发明实施例的无线通信终端的计算机***900;所述无线通信终端可以是配置有摄像头的手机、平板电脑等智能移动终端。
如图9所示,计算机***900包括中央处理单元(Central Processing Unit,CPU)901,其可以根据存储在只读存储器(Read-Only Memory,ROM)902中的程序或者从储存部分908加载到随机访问存储器(Random Access Memory,RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有***操作所需的各种程序和数据。CPU901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(Input/Output,I/O)接口905也连接至总线904。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分907;包括硬盘等的储存部分908;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入储存部分908。
特别地,根据本发明的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分909从网络上被下载和安装,和/或从可拆卸介质911被安装。在该计算机程序被中央处理单元(CPU)901执行时,执行本申请的***中限定的各种功能。
需要说明的是,本发明实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本发明各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本发明实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
因此,本申请实施例的计算机***900,能够实现对目标场景的准确定位,并及时展示;并可以有效的加深用户的沉浸感,提升用户体验。
需要说明的是,作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如下述实施例中所述的方法。例如,所述的电子设备可以实现如图1、图2、图3或图4所示的各个步骤。
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
该功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (20)

  1. 一种地图构建方法,其特征在于,包括:
    采集当前环境的环境图像;
    获取所述环境图像的图像特征信息,根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像;其中,所述图像特征信息包括特征点信息和对应的描述子信息;
    获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息;
    基于所述关键帧图像构建所述当前环境的地图数据;其中,所述地图数据包括所述关键帧图像对应的图像特征信息和三维特征信息。
  2. 根据权利要求1所述的地图构建方法,其特征在于,在筛选所述关键帧图像后,所述方法还包括:
    利用已训练的词袋模型对所述关键帧图像进行特征提取以获取图像特征信息,以利用所述关键帧图像对应的图像特征信息、三维特征信息和图像特征信息构建所述地图数据。
  3. 根据权利要求1所述的地图构建方法,其特征在于,所述采集当前环境的环境图像,包括:
    利用单目摄像头以预设频率连续采集当前环境对应的环境图像。
  4. 根据权利要求1所述的地图构建方法,其特征在于,所述获取所述环境图像的图像特征信息,包括:利用已训练的基于超点的特征提取模型获取所述环境图像的图像特征信息;包括:
    利用编码器对所述环境图像进行编码以获取的特征编码;
    将所述特征编码输入感兴趣点编码器,以获取所述环境图像对应的所述特征点信息;以及
    将所述特征编码输入描述子编码器,以获取所述环境图像对应的所述描述子信息。
  5. 根据权利要求4所述的地图构建方法,其特征在于,所述方法还包括:预先训练基于超点的特征提取模型,包括:
    构建合成数据库,并利用合成数据库训练元素特征点提取模型;
    对MS-COCO数据集中各原始图像进行随机的单应性变换以获取各原始图像对应的变换图像,利用已训练的魔点模型对变换图像进行特征提取,以获取MS-COCO数据集中各原始图像的特征点真值;
    以MS-COCO数据集中各原始图像以及各原始图像对应的特征点标签为训练样本数据,训练超点模型以获取基于超点的特征提取模型。
  6. 根据权利要求1或2所述的地图构建方法,其特征在于,所述根据所述图像特征信息对连续的所述环境图像进行特征点匹配以筛选关键帧图像,包括:
    以首帧环境图像作为当前关键帧图像,并选取与所述当前关键帧图像连续的一帧或多帧待匹配环境图像;
    利用所述描述子信息对所述当前关键帧图像与所述待匹配环境图像进行特征点匹配,并将特征点匹配结果大于预设阈值的所述待匹配环境图像作为所述关键帧图像;
    将筛选的所述关键帧图像作为当前关键帧图像,并获取与该当前关键帧图像连续的一帧或多帧待匹配环境图像;
    利用所述描述子信息对该当前关键帧图像与待匹配环境图像进行特征点匹配,以连续筛选所述关键帧图像。
  7. 根据权利要求6所述的地图构建方法,其特征在于,所述获取所述关键帧图像中匹配的特征点对应的深度信息,以构建所述关键帧图像的三维特征信息,包括:
    利用所述当前关键帧图像,以及与所述当前关键帧图像相匹配的所述关键帧图像中 相互匹配的特征点建立特征点匹配对;
    计算所述特征点匹配对对应特征点的深度信息,以利用所述特征点的深度信息和特征点信息构建所述关键帧图像的三维特征信息。
  8. 根据权利要求6所述的地图构建方法,其特征在于,所述利用所述描述子信息对该当前关键帧图像与待匹配环境图像进行特征点匹配,包括:
    对当前关键帧图像选取固定数量的特征点进行匹配。
  9. 根据权利要求6所述的地图构建方法,其特征在于,所述利用所述描述子信息对该当前关键帧图像与待匹配环境图像进行特征点匹配,包括:
    根据当前关键帧图像中所包含的对象选取预设数量的特征点进行匹配。
  10. 根据权利要求6所述的地图构建方法,其特征在于,所述方法还包括:
    在获取所述当前关键帧图像与所述待匹配环境图像之间的特征点匹配结果后,对特征点匹配结果进行筛选以清除错误的匹配结果。
  11. 根据权利要求1所述的地图构建方法,其特征在于,所述基于所述关键帧图像构建所述当前环境的地图数据,包括:
    将关键帧图像及对应的特征点信息、描述子信息和三维特征信息进行序列化存储,以生成离线形式的地图数据。
  12. 一种定位方法,其特征在于,包括:
    响应于一定位指令,采集目标图像;
    提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息;
    根据所述图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像;
    根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
  13. 根据权利要求12所述的定位方法,其特征在于,所述采集目标图像后,所述方法还包括:
    利用已训练的词袋模型对所述目标图像进行特征提取以获取图像特征信息;
    利用所述目标图像的图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像。
  14. 根据权利要求13所述的定位方法,其特征在于,所述利用所述目标图像的图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像,包括:
    基于所述图像特征信息计算所述目标图像和所述地图中各关键帧图像之间的相似度,并筛选相似度大于阈值的关键帧图像,以获取匹配帧集合;
    根据所述关键帧图像的时间戳信息以及相似度数值对所述匹配帧集合中的各关键帧图像进行分组,以获取至少一个图像组;
    计算所述图像组与所述目标图像之间的匹配度,以选取匹配度最高的所述图像组作为匹配图像组;
    选取所述匹配图像组中与所述目标图像之间相似度最高的所述关键帧图像作为待匹配图像,并将所述待匹配图像对应的相似度与第二阈值进行对比;
    在所述待匹配图像对应的相似度大于所述第二阈值时,将所述待匹配图像判定为所述目标图像对应的匹配帧图像;或者,
    在所述待匹配图像对应的相似度小于所述第二阈值时,判定匹配失败。
  15. 根据权利要求13所述的定位方法,其特征在于,在确定匹配帧图像后,所述方法还包括:
    基于图像特征信息对所述目标图像与所述匹配帧图像进行特征点匹配,以获取目标匹配特征点;
    将所述匹配帧图像的三维特征信息和所述目标匹配特征点输入已训练的PnP模型中,以获取所述目标图像的姿态信息。
  16. 根据权利要求12所述的定位方法,其特征在于,所述提取所述目标图像的图像特征信息,包括:利用已训练的基于超点的特征提取模型获取所述环境图像的图像特征信息;包括:
    利用编码器对所述目标图像进行编码以获取的特征编码;
    将所述特征编码输入感兴趣点编码器,以获取所述目标图像对应的所述特征点信息;以及
    将所述特征编码输入描述子编码器,以获取所述目标图像对应的所述描述子信息。
  17. 根据权利要求12所述的定位方法,其特征在于,所述根据所述匹配帧图像生成所述目标图像对应的当前定位结果,包括:
    将所述匹配帧图像结合所述目标图像的位姿信息生成所述当前定位结果。
  18. 一种定位***,其特征在于,所述定位***包括:
    定位指令响应模块,用于响应于一定位指令,采集目标图像;
    图像特征识别模块,用于提取所述目标图像的图像特征信息;其中,所述图像特征信息包括所述目标图像的特征点信息和描述子信息;
    匹配帧筛选模块,用于根据所述图像特征信息与地图数据中各关键帧图像进行匹配以确定匹配帧图像;
    定位结果生成模块,用于根据所述匹配帧图像生成所述目标图像对应的当前定位结果。
  19. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11中任一项所述的地图构建方法,或者实现如权利要求12至17中任一项所述的定位方法。
  20. 一种无线通信终端,其特征在于,所述无线通信终端包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至11中任一项所述的地图构建方法,或者实现如权利要求12至17中任一项所述的定位方法。
PCT/CN2020/124547 2019-10-31 2020-10-28 地图构建方法、定位方法及***、无线通信终端、计算机可读介质 WO2021083242A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20880439.3A EP3975123A4 (en) 2019-10-31 2020-10-28 MAP PRODUCTION METHOD, POSITIONING METHOD AND SYSTEM, WIRELESS COMMUNICATIONS TERMINAL AND COMPUTER READABLE MEDIA
US17/561,307 US20220114750A1 (en) 2019-10-31 2021-12-23 Map constructing method, positioning method and wireless communication terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911056898.3 2019-10-31
CN201911056898.3A CN110866953B (zh) 2019-10-31 2019-10-31 地图构建方法及装置、定位方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/561,307 Continuation-In-Part US20220114750A1 (en) 2019-10-31 2021-12-23 Map constructing method, positioning method and wireless communication terminal

Publications (1)

Publication Number Publication Date
WO2021083242A1 true WO2021083242A1 (zh) 2021-05-06

Family

ID=69653488

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124547 WO2021083242A1 (zh) 2019-10-31 2020-10-28 地图构建方法、定位方法及***、无线通信终端、计算机可读介质

Country Status (4)

Country Link
US (1) US20220114750A1 (zh)
EP (1) EP3975123A4 (zh)
CN (1) CN110866953B (zh)
WO (1) WO2021083242A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111973A (zh) * 2021-05-10 2021-07-13 北京华捷艾米科技有限公司 一种基于深度相机的动态场景处理方法及装置
CN115115822A (zh) * 2022-06-30 2022-09-27 小米汽车科技有限公司 车端图像处理方法、装置、车辆、存储介质及芯片
CN116091719A (zh) * 2023-03-06 2023-05-09 山东建筑大学 一种基于物联网的河道数据管理方法及***
CN117274442A (zh) * 2023-11-23 2023-12-22 北京新兴科遥信息技术有限公司 一种面向自然资源地图的动画生成方法和***

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866953B (zh) * 2019-10-31 2023-12-29 Oppo广东移动通信有限公司 地图构建方法及装置、定位方法及装置
CN111652933B (zh) * 2020-05-06 2023-08-04 Oppo广东移动通信有限公司 基于单目相机的重定位方法、装置、存储介质与电子设备
CN111990787B (zh) * 2020-09-04 2022-02-25 华软智科(深圳)技术有限公司 基于人工智能的档案馆密集柜自证方法
CN112598732A (zh) * 2020-12-10 2021-04-02 Oppo广东移动通信有限公司 目标设备定位方法、地图构建方法及装置、介质、设备
WO2022231523A1 (en) * 2021-04-29 2022-11-03 National University Of Singapore Multi-camera system
US11983627B2 (en) * 2021-05-06 2024-05-14 Black Sesame Technologies Inc. Deep learning based visual simultaneous localization and mapping
CN115442338A (zh) * 2021-06-04 2022-12-06 华为技术有限公司 3d地图的压缩、解压缩方法和装置
CN113591847B (zh) * 2021-07-28 2022-12-20 北京百度网讯科技有限公司 一种车辆定位方法、装置、电子设备及存储介质
CN113688842B (zh) * 2021-08-05 2022-04-29 北京科技大学 一种基于解耦合的局部图像特征提取方法
CN117036663A (zh) * 2022-04-18 2023-11-10 荣耀终端有限公司 视觉定位方法、设备和存储介质
CN114972909A (zh) * 2022-05-16 2022-08-30 北京三快在线科技有限公司 一种模型训练的方法、构建地图的方法及装置
CN115982399B (zh) * 2023-03-16 2023-05-16 北京集度科技有限公司 图像查找方法、移动设备、电子设备、及计算机程序产品
CN116030136B (zh) * 2023-03-29 2023-06-09 中国人民解放军国防科技大学 基于几何特征的跨视角视觉定位方法、装置和计算机设备
CN117710451A (zh) * 2023-06-02 2024-03-15 荣耀终端有限公司 视觉定位方法、介质以及电子设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107677279A (zh) * 2017-09-26 2018-02-09 上海思岚科技有限公司 一种定位建图的方法及***
US20180204338A1 (en) * 2017-01-13 2018-07-19 Otsaw Digital Pte. Ltd. Three-dimensional mapping of an environment
CN109583457A (zh) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 一种机器人定位与地图构建的方法及机器人
CN109658445A (zh) * 2018-12-14 2019-04-19 北京旷视科技有限公司 网络训练方法、增量建图方法、定位方法、装置及设备
CN109671120A (zh) * 2018-11-08 2019-04-23 南京华捷艾米软件科技有限公司 一种基于轮式编码器的单目slam初始化方法及***
CN109816686A (zh) * 2019-01-15 2019-05-28 山东大学 基于物体实例匹配的机器人语义slam方法、处理器及机器人
CN109816769A (zh) * 2017-11-21 2019-05-28 深圳市优必选科技有限公司 基于深度相机的场景地图生成方法、装置及设备
CN110866953A (zh) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 地图构建方法及装置、定位方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2506338A (en) * 2012-07-30 2014-04-02 Sony Comp Entertainment Europe A method of localisation and mapping
CN108447097B (zh) * 2018-03-05 2021-04-27 清华-伯克利深圳学院筹备办公室 深度相机标定方法、装置、电子设备及存储介质
CN108692720B (zh) * 2018-04-09 2021-01-22 京东方科技集团股份有限公司 定位方法、定位服务器及定位***
CN110310333B (zh) * 2019-06-27 2021-08-31 Oppo广东移动通信有限公司 定位方法及电子设备、可读存储介质
CN110349212B (zh) * 2019-06-28 2023-08-25 Oppo广东移动通信有限公司 即时定位与地图构建的优化方法及装置、介质和电子设备
CN110322500B (zh) * 2019-06-28 2023-08-15 Oppo广东移动通信有限公司 即时定位与地图构建的优化方法及装置、介质和电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204338A1 (en) * 2017-01-13 2018-07-19 Otsaw Digital Pte. Ltd. Three-dimensional mapping of an environment
CN107677279A (zh) * 2017-09-26 2018-02-09 上海思岚科技有限公司 一种定位建图的方法及***
CN109816769A (zh) * 2017-11-21 2019-05-28 深圳市优必选科技有限公司 基于深度相机的场景地图生成方法、装置及设备
CN109671120A (zh) * 2018-11-08 2019-04-23 南京华捷艾米软件科技有限公司 一种基于轮式编码器的单目slam初始化方法及***
CN109583457A (zh) * 2018-12-03 2019-04-05 荆门博谦信息科技有限公司 一种机器人定位与地图构建的方法及机器人
CN109658445A (zh) * 2018-12-14 2019-04-19 北京旷视科技有限公司 网络训练方法、增量建图方法、定位方法、装置及设备
CN109816686A (zh) * 2019-01-15 2019-05-28 山东大学 基于物体实例匹配的机器人语义slam方法、处理器及机器人
CN110866953A (zh) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 地图构建方法及装置、定位方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG KAI: "RESEARCH ON LOCALIZATION AND MAPPING OF MOBILE ROBOT BASED ON MACHINE VISION", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 1 June 2017 (2017-06-01), XP055808927 *
See also references of EP3975123A4

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111973A (zh) * 2021-05-10 2021-07-13 北京华捷艾米科技有限公司 一种基于深度相机的动态场景处理方法及装置
CN115115822A (zh) * 2022-06-30 2022-09-27 小米汽车科技有限公司 车端图像处理方法、装置、车辆、存储介质及芯片
CN115115822B (zh) * 2022-06-30 2023-10-31 小米汽车科技有限公司 车端图像处理方法、装置、车辆、存储介质及芯片
CN116091719A (zh) * 2023-03-06 2023-05-09 山东建筑大学 一种基于物联网的河道数据管理方法及***
CN117274442A (zh) * 2023-11-23 2023-12-22 北京新兴科遥信息技术有限公司 一种面向自然资源地图的动画生成方法和***
CN117274442B (zh) * 2023-11-23 2024-03-08 北京新兴科遥信息技术有限公司 一种面向自然资源地图的动画生成方法和***

Also Published As

Publication number Publication date
CN110866953B (zh) 2023-12-29
US20220114750A1 (en) 2022-04-14
EP3975123A4 (en) 2022-08-31
CN110866953A (zh) 2020-03-06
EP3975123A1 (en) 2022-03-30

Similar Documents

Publication Publication Date Title
WO2021083242A1 (zh) 地图构建方法、定位方法及***、无线通信终端、计算机可读介质
CN109635621B (zh) 用于第一人称视角中基于深度学习识别手势的***和方法
US10198823B1 (en) Segmentation of object image data from background image data
US10198623B2 (en) Three-dimensional facial recognition method and system
CN109359538B (zh) 卷积神经网络的训练方法、手势识别方法、装置及设备
US9965865B1 (en) Image data segmentation using depth data
WO2020010979A1 (zh) 手部关键点的识别模型训练方法、识别方法及设备
Jian et al. The extended marine underwater environment database and baseline evaluations
CN108388882B (zh) 基于全局-局部rgb-d多模态的手势识别方法
WO2020134818A1 (zh) 图像处理方法及相关产品
CN111046125A (zh) 一种视觉定位方法、***及计算机可读存储介质
CN104281840B (zh) 一种基于智能终端定位识别建筑物的方法及装置
CN110019914B (zh) 一种支持三维场景交互的三维模型数据库检索方法
CN106934351B (zh) 手势识别方法、装置及电子设备
WO2018210047A1 (zh) 数据处理方法、数据处理装置、电子设备及存储介质
TW202244680A (zh) 位置姿勢獲取方法、電子設備及電腦可讀儲存媒體
CN114565668A (zh) 即时定位与建图方法及装置
CN111739073A (zh) 高效快速的手持装置的影像配准优化方法
WO2022068569A1 (zh) 水印检测方法、装置、计算机设备及存储介质
WO2022110877A1 (zh) 深度检测方法、装置、电子设备、存储介质及程序
Park et al. Estimating the camera direction of a geotagged image using reference images
Hwang et al. Optimized clustering scheme-based robust vanishing point detection
CN110135474A (zh) 一种基于深度学习的倾斜航空影像匹配方法和***
WO2022016803A1 (zh) 视觉定位方法及装置、电子设备和计算机可读存储介质
CN114694257A (zh) 多人实时三维动作识别评估方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880439

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020880439

Country of ref document: EP

Effective date: 20211223

NENP Non-entry into the national phase

Ref country code: DE