WO2023083256A1 - Pose display method and apparatus, and system, server and storage medium - Google Patents

Pose display method and apparatus, and system, server and storage medium Download PDF

Info

Publication number
WO2023083256A1
WO2023083256A1 PCT/CN2022/131134 CN2022131134W WO2023083256A1 WO 2023083256 A1 WO2023083256 A1 WO 2023083256A1 CN 2022131134 W CN2022131134 W CN 2022131134W WO 2023083256 A1 WO2023083256 A1 WO 2023083256A1
Authority
WO
WIPO (PCT)
Prior art keywords
positioning
image
target
map
pose
Prior art date
Application number
PCT/CN2022/131134
Other languages
French (fr)
Chinese (zh)
Inventor
李佳宁
李�杰
毛慧
浦世亮
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023083256A1 publication Critical patent/WO2023083256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/45Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement
    • G01S19/47Determining position by combining measurements of signals from the satellite radio beacon positioning system with a supplementary measurement the supplementary measurement being an inertial measurement, e.g. tightly coupled inertial
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders

Definitions

  • the present application relates to the field of computer vision, and in particular to a pose display method, device and system, a server, and a machine-readable storage medium.
  • GPS Global Positioning System
  • the Beidou satellite navigation system consists of three parts: the space segment, the ground segment and the user segment. It can provide users with high-precision, high-reliability positioning, navigation, and timing services around the clock and around the world, and has regional navigation, positioning, and timing capabilities. .
  • the GPS or the Beidou satellite navigation system can be used to locate the terminal device.
  • the GPS signal or the Beidou signal is relatively good, the GPS or Beidou satellite navigation system can be used to accurately locate the terminal device.
  • the GPS or Beidou satellite navigation system cannot accurately locate the terminal device. For example, in coal, electric power, petrochemical and other energy industries, there are more and more requirements for positioning. These positioning requirements are generally in indoor environments. Due to problems such as signal occlusion, it is impossible to accurately locate terminal devices.
  • the present application provides a pose display method, which is applied to a cloud edge management system.
  • the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of the target scene.
  • the method includes: the terminal device in the target scene In the process of moving in the center, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames image, select a part of the frame image from the multi-frame image as the image to be tested, and send the image to be tested and the self-positioning track to the server; the server is based on the image to be tested and the self-positioning track
  • the positioning track generates a fusion positioning track of the terminal device in the three-dimensional visual map, the fusion positioning track includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning track, the server Determine the target positioning pose corresponding to the fused positioning pose, and
  • the present application provides a cloud edge management system
  • the cloud edge management system includes a terminal device and a server
  • the server includes a three-dimensional visual map of the target scene
  • the terminal device is used to acquire The target image of the target scene and the motion data of the terminal device, determining the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames of images, then from the Selecting a part of frame images from the multi-frame images as the image to be tested, and sending the image to be tested and the self-positioning track to the server;
  • the server is used to generate the image to be tested and the self-positioning track based on the
  • the fusion positioning trajectory of the terminal device in the three-dimensional visual map, the fusion positioning trajectory includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning trajectory, determine the fusion positioning pose corresponding target positioning pose, and display the target positioning pose.
  • the present application provides a pose display device, which is applied to a server in a cloud edge management system, where the server includes a three-dimensional visual map of a target scene, and the device includes: an acquisition module for acquiring an image to be tested and a self-positioning trajectory; Wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device, and the image to be tested is a partial frame image in the multi-frame images included in the target image; generating A module, configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image-to-be-tested and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; the display module uses For each fused positioning pose in the fused positioning trajectory, determine a target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
  • the present application provides a server, including a processor and a machine-readable storage medium, the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor is used to execute the machine-executable instructions to implement the pose display method according to the embodiment of the present application.
  • the present application provides a machine-readable storage medium.
  • Computer instructions are stored on the machine-readable storage medium.
  • the pose display method according to the embodiment of the present application can be implemented.
  • a cloud-edge combined positioning and display method is proposed in the embodiment of the present application.
  • the target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data to obtain High frame rate self-positioning trajectories.
  • the server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate It is a vision-based indoor positioning method and can display fusion positioning tracks.
  • the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
  • FIG. 1 is a schematic flowchart of a pose display method in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a cloud edge management system in an embodiment of the present application
  • FIG. 3 is a schematic flow diagram of determining a self-positioning trajectory in an embodiment of the present application
  • FIG. 4 is a schematic flow diagram of determining a global positioning track in an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a self-positioning trajectory, a global positioning trajectory and a fusion positioning trajectory
  • FIG. 6 is a schematic flow diagram of determining a fusion positioning trajectory in an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a pose display device in an embodiment of the present application.
  • first, second, and third may be used in the embodiment of the present application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, furthermore, the use of the word “if” could be interpreted as “at” or “when” or "in response to a determination.”
  • a pose display method is proposed, which can be applied to the cloud edge management system.
  • the cloud edge management system can include a terminal device (that is, a terminal device at the edge end) and a server (that is, a server in the cloud), and the server A three-dimensional visual map of the target scene (such as indoor environment, outdoor environment, etc.) may be included.
  • Fig. 1 it is a schematic flow chart of the pose display method, which may include:
  • Step 101 During the movement of the terminal device in the target scene, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and motion data.
  • the terminal device traverses the current frame of images from the multiple frames of images; based on the self-positioning Pose, the map position of the terminal device in the self-positioning coordinate system (that is, the coordinate position) and the motion data determine the self-positioning pose corresponding to the current frame image (that is, the self-positioning pose of the terminal device); based on multi-frame images
  • the self-positioning pose corresponding to each frame of the image generates the self-positioning trajectory of the terminal device in the self-positioning coordinate system.
  • Posture includes position and posture
  • self-positioning coordinate system is a coordinate system established with the self-positioning pose corresponding to the first frame image in the multi-frame images as the coordinate origin.
  • the map position of the terminal device in the self-positioning coordinate system may be generated based on the current position of the terminal device (ie, the position corresponding to the current frame image). If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device.
  • the current location of the terminal device is, for example, the actual physical location of the terminal device when it collects the current frame of image.
  • the current frame image is a key image. If the number of matching feature points between the current frame image and the previous frame image of the current frame image reaches a preset threshold, it is determined that the current frame image is a non-key image.
  • the first frame of image may be used as a key image.
  • Step 102 If the target image includes multiple frames of images, the terminal device selects a part of frame images from the multiple frames of images as the image to be tested, and sends the image to be tested and the self-positioning trajectory to the server.
  • the terminal device may select M frames of images from multiple frames of images as images to be tested, and M may be a positive integer, such as 1, 2, 3, and so on.
  • M may be a positive integer, such as 1, 2, 3, and so on.
  • Step 103 the server generates a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory may include multiple fusion positioning poses.
  • the server may determine the target map point corresponding to the image to be tested from the three-dimensional visual map of the target scene, and determine the global positioning track of the terminal device in the three-dimensional visual map based on the target map point. Then, the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track.
  • the frame rate of the fused positioning pose included in the fused positioning track may be greater than the frame rate of the global positioning pose included in the global positioning track, that is, the frame rate of the fused positioning track is higher than the frame rate of the global positioning track .
  • the pose frame rate refers to the frequency of pose output, that is, the number of poses output by the system per second.
  • the fused positioning track may be a high frame rate pose in the 3D visual map, and the global positioning track may be a low frame rate pose in the 3D visual map.
  • the frame rate of the fused localization trajectory is higher than that of the global localization trajectory, indicating that the number of fused localization poses is greater than the number of global localization poses.
  • the frame rate of the fused positioning pose included in the fused positioning track may be equal to the frame rate of the self-positioning pose included in the self-positioning track, that is, the frame rate of the fused positioning track is equal to the frame rate of the self-positioning track, that is, the self-positioning Trajectories can be high frame rate poses.
  • the frame rate of the fused localization trajectory is equal to the frame rate of the self-localization trajectory, which means that the number of fused localization poses is equal to the number of self-localization poses.
  • the 3D visual map may include but not limited to at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, and a sample local descriptor corresponding to the feature points in the sample image. Descriptor, map point information.
  • the server determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the global positioning track of the terminal device in the 3D visual map based on the target map point, which may include but not limited to: For each frame of the image to be tested, candidate sample images are selected from the multiple frames of sample images based on the similarity between the image to be tested and the multiple frames of sample images corresponding to the three-dimensional visual map.
  • a plurality of feature points are acquired from the image to be tested; for each feature point, a target map point corresponding to the feature point is determined from the plurality of map points corresponding to the candidate sample image.
  • a global positioning pose in the three-dimensional visual map corresponding to the image to be tested is determined based on the plurality of feature points and target map points corresponding to the plurality of feature points.
  • a global positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the global positioning poses corresponding to all images to be tested.
  • the server selects candidate sample images from the multi-frame sample images, which may include: determining the global descriptor to be tested corresponding to the image to be tested , determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map; wherein, the 3D visual map includes at least the sample global descriptor corresponding to each frame of sample image.
  • a candidate sample image is selected from multiple frames of sample images; wherein, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image
  • the distance between the global descriptor to be tested and the distance between each sample global descriptor is the minimum distance; and/or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is less than distance threshold.
  • the server determines the global descriptor to be tested corresponding to the image to be tested, which may include but not limited to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and determine the bag-of-words vector corresponding to the image-to-be-tested A global descriptor to be tested; or, inputting the image to be tested into a trained deep learning model to obtain a target vector corresponding to the image to be tested, and determining the target vector as the global descriptor to be tested corresponding to the image to be tested.
  • the above is just an example of determining the global descriptor to be tested, and is not limited thereto.
  • the server determines the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image, which may include but not limited to: determining the local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used for represents the feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested. determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each map point corresponding to the candidate sample image; wherein, the three-dimensional visual map includes at least a sample corresponding to each map point corresponding to the candidate sample image local descriptor.
  • the target map point can be selected from multiple map points corresponding to the candidate sample image based on the distance between the local descriptor to be tested and each sample local descriptor; wherein, the local descriptor to be tested and the target map
  • the distance between the sample local descriptors corresponding to the points may be the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor, and/or, the distance between the local descriptor to be tested and the target map point The distance between sample local descriptors is less than the distance threshold.
  • the server generates the fused positioning trajectory of the terminal device in the 3D visual map based on the self-positioning trajectory and the global positioning trajectory, which may include but not limited to: the server may select from all self-positioning poses included in the self-positioning trajectory N self-localization poses corresponding to the target time period, and select P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; wherein, N and P are positive integers, and N is greater than P. Based on the N self-positioning poses and the P global positioning poses, N fusion positioning poses corresponding to the N self-positioning poses are determined, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses. Based on the N fusion positioning poses, the fusion positioning trajectory of the terminal device in the 3D visual map is generated.
  • the server may also update the fused positioning track. Specifically, the server may also select an initial fused positioning pose from the fused positioning trajectory, and select an initial self-localization pose corresponding to the initial fused positioning pose from the self-localization trajectory.
  • the target self-localization pose is selected from the self-localization trajectory, and the target fusion localization pose is determined based on the initial fusion localization pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory is generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
  • Step 104 For each fused positioning pose in the fused positioning trajectory, the server determines a target positioning pose corresponding to the fused positioning pose, and displays the target positioning pose.
  • the server may determine the fused positioning pose as the target positioning pose, and display the target positioning pose on the three-dimensional visual map.
  • the server converts the fusion positioning pose into the target positioning pose in the 3D visualized map, and displays the target positioning pose through the 3D visualized map.
  • the 3D visual map is constructed by the visual mapping algorithm and is only used for the map positioning algorithm; the 3D visual map is a 3D model used to show the 3D structure of the scene.
  • the method of determining the target transformation matrix between the 3D visual map and the 3D visual map may include but not limited to: for each of the multiple marked points in the target scene, the corresponding A coordinate pair, the coordinate pair may include the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; the target transformation matrix is determined based on the coordinate pairs corresponding to the multiple calibration points.
  • an initial transformation matrix map the position coordinates in the three-dimensional visual map to mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determine whether the initial transformation matrix is based on the relationship between the mapping coordinates and the actual coordinates in the three-dimensional visual map Converged; if yes, then determine the initial transformation matrix as the target transformation matrix; if not, then adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, return to execute the 3D visual map based on the initial transformation matrix
  • the position coordinates in are mapped to the mapping coordinates in the three-dimensional visualization map, and so on until the target transformation matrix is obtained.
  • a positioning and display method combining cloud and edge is proposed, and the target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data. Get high frame rate self-localization trajectories.
  • the server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate High-speed and high-precision positioning function, realize high-precision, low-cost, and easy-to-deploy indoor positioning function, it is a vision-based indoor positioning method, and can display the fusion positioning track in the three-dimensional visual map.
  • the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
  • the embodiment of the present application proposes a cloud-edge combined visual positioning and display method.
  • the server determines the fused positioning track of the terminal device in the 3D visual map, and displays the fused positioning track.
  • the target scene can be an indoor environment, that is, when the terminal device moves in the indoor environment, the server determines the fusion positioning track of the terminal device in the 3D visual map, that is, a vision-based indoor positioning method is proposed.
  • the target scene can also be an outdoor environment. There is no restriction on this.
  • the cloud edge management system may include terminal devices (that is, edge terminal devices) and servers (that is, cloud servers). Of course, the cloud edge management system may also include other devices , such as wireless base stations and routers, etc., there is no restriction on this.
  • the server may include a 3D visual map of the target scene and a 3D visual map corresponding to the 3D visual map, the server may generate a fusion positioning track of the terminal device in the 3D visual map, and display the fusion positioning track in the 3D visual map (required converted into a trajectory that can be displayed in the three-dimensional visualization map), so that managers can view the fusion positioning trajectory in the three-dimensional visualization map through the web.
  • the terminal device may include a vision sensor and a motion sensor, etc.
  • the vision sensor may be a camera, etc., and the vision sensor is used to collect images of a target scene during the movement of the terminal device. For the convenience of distinction, this image is recorded as a target image, and the target image includes multiple frames of images (that is, multiple frames of real-time images collected during the movement of the terminal device).
  • the motion sensor can be such as IMU (Inertial Measurement Unit, Inertial Measurement Unit), etc.
  • the IMU is a measuring device including a gyroscope and an accelerometer.
  • the motion sensor is used to collect motion data of the terminal device during the movement of the terminal device, such as acceleration and angular velocity etc.
  • the terminal device can be a wearable device (such as a video helmet, smart watch, smart glasses, etc.), and the visual sensor and the motion sensor are deployed on the wearable device; Personnel carry it with them when performing work, and have a device that integrates real-time video and audio collection, photography, recording, intercom, positioning, etc.), and the visual sensor and motion sensor are deployed on the recorder; or, the terminal device is a camera ( Such as split cameras, etc.), and vision sensors and motion sensors are deployed on the cameras.
  • the terminal device is a camera (such as split cameras, etc.), and vision sensors and motion sensors are deployed on the cameras.
  • the terminal device can acquire target images and motion data, perform high frame rate self-positioning based on the target images and motion data, and obtain high frame rate self-positioning trajectories (such as 6DOF (six degrees of freedom) self-positioning trajectories).
  • the self-localization trajectory may include multiple self-localization poses. Since the self-localization trajectory is a self-localization trajectory with a high frame rate, the number of self-localization poses in the self-localization trajectory is relatively large.
  • the terminal device can select some frame images from the multi-frame images of the target image as the image to be tested, and send the high frame rate self-positioning trajectory and the image to be tested to the server.
  • the server can obtain the self-positioning track and the image to be tested, and the server can perform global positioning at a low frame rate according to the image to be tested and the 3D visual map of the target scene, and obtain a global positioning track with a low frame rate (that is, the position of the terminal device in the 3D visual map) global positioning track).
  • the global positioning track may include multiple global positioning poses. Since the global positioning track is a global positioning track with a low frame rate, the number of global positioning poses in the global positioning track is relatively small.
  • the server can fuse the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory to obtain the high frame rate fusion positioning trajectory, that is, the high frame rate fusion positioning trajectory in the 3D visual map, that is, the high frame rate fusion positioning results.
  • the fused positioning trajectory may include multiple fused positioning poses. Since the fused positioning trajectory is a high frame rate fused positioning trajectory, the number of fused positioning poses in the fused positioning trajectory is relatively large.
  • poses (such as self-positioning poses, global positioning poses, fusion positioning poses, etc.) can be positions and poses, which are generally represented by rotation matrices and translation vectors, without limitation.
  • a globally unified high frame rate visual positioning function can be realized, and a high frame rate fusion positioning trajectory (such as 6DOF pose) in the three-dimensional visual map can be obtained. It is a globally consistent high frame rate positioning method, which realizes high frame rate, high precision, low cost, and easy-to-deploy indoor positioning functions of terminal equipment, and realizes indoor globally consistent high frame rate positioning functions.
  • the terminal device is an electronic device with a visual sensor and a motion sensor, which can acquire the target image of the target scene (such as continuous video image) and the motion data of the terminal device (such as IMU data), and determine the terminal device's motion based on the target image and motion data self-positioning trajectory.
  • the target image of the target scene such as continuous video image
  • the motion data of the terminal device such as IMU data
  • the target image may include multiple frames of images, and for each frame of images, the terminal device determines a self-localization pose corresponding to the image, that is, multiple frames of images correspond to multiple self-localization poses.
  • the self-positioning track of the terminal device may include multiple self-positioning poses, which can be understood as a collection of multiple self-positioning poses.
  • the terminal device determines the self-localization pose corresponding to the first frame image, and for the second frame image in the multi-frame images, the terminal device determines the self-localization pose corresponding to the second frame image, and so on.
  • the self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), and the self-localization pose corresponding to the second frame image is the pose point in the reference coordinate system, that is, relative to The pose point of the coordinate origin (that is, the self-positioning pose corresponding to the first frame image), and the self-positioning pose corresponding to the third frame image is the pose point in the reference coordinate system, that is, the pose point relative to the coordinate origin , and so on, the self-localization pose corresponding to each frame image is the pose point in the reference coordinate system.
  • these self-localization poses can be composed into a self-localization trajectory in the reference coordinate system, and the self-localization trajectory includes these self-localization poses.
  • Step 301 acquiring a target image of a target scene and motion data of a terminal device.
  • Step 302 if the target image includes multiple frames of images, traverse the current frame of images from the multiple frames of images.
  • the self-positioning pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), that is, the self-positioning pose and The origin of the coordinates coincides.
  • subsequent steps may be used to determine the self-localization pose corresponding to the second frame image.
  • subsequent steps can be used to determine the self-localization pose corresponding to the third frame image, and so on, each frame image can be traversed as the current frame image.
  • Step 303 using the optical flow algorithm to calculate the feature point association relationship between the current frame image and the previous frame image of the current frame image.
  • the optical flow algorithm uses the change of pixels in the current frame image in the time domain and the correlation between the current frame image and the previous frame image to find the corresponding relationship between the current frame image and the previous frame image, Thus, the manner of calculating the motion information of the object between the current frame image and the previous frame image.
  • Step 304 Determine whether the current frame image is a key image based on the number of matching feature points between the current frame image and the previous frame image. For example, if the number of matching feature points between the current frame image and the previous frame image does not reach the preset threshold, it is used to indicate that the current frame image and the previous frame image have changed greatly, resulting in the matching between the two frame images If the number of feature points is relatively small, it is determined that the current frame image is a key image, and step 305 is performed.
  • step 306 is performed.
  • the matching ratio between the current frame image and the previous frame image can also be calculated based on the number of matching feature points between the current frame image and the previous frame image, for example, the ratio of the number of matching feature points to the total number of feature points Proportion. If the matching ratio does not reach the preset ratio, it is determined that the current frame image is a key image, and if the matching ratio reaches the preset ratio, it is determined that the current frame image is a non-key image.
  • Step 305 if the current frame image is the key image, generate a map position in the self-positioning coordinate system (ie, the reference coordinate system) based on the current position of the terminal device (ie, the position where the terminal device is when collecting the current frame image), that is, generate A new 3D map location. If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device.
  • Step 306 Determine the self-positioning position corresponding to the current frame image based on the self-positioning pose corresponding to each frame image of the K frame image in front of the current frame image, the map position of the terminal device in the self-positioning coordinate system, and the motion data of the terminal device
  • K can be a positive integer, and can be a value configured according to experience, and there is no limitation on this.
  • all the motion data between the previous frame image of the current frame image and the current frame image can be pre-integrated to obtain the inertial measurement constraints between the two frame images.
  • the self-positioning pose and motion data such as velocity, acceleration, angular velocity, etc.
  • the K frame images such as sliding windows
  • the map position in the self-positioning coordinate system and inertial measurement constraints (previous frame image
  • the object’s velocity, acceleration, angular velocity, etc.) between the current frame image and the current frame image can be optimized by using the bundled set to optimize the self-positioning pose and velocity corresponding to the K frame images (such as sliding windows) in front of the current frame image, and the inertial measurement sensor
  • the state variables of offset", "map point position in the self-localization coordinate system” are jointly optimized and updated to obtain the self-localization pose corresponding to the current frame image, and there is no limit to this bundle optimization process.
  • a certain frame and part of the map positions within the sliding window can also be marginalized, and these constraint information can be preserved in a priori form.
  • the terminal device can use the VIO (Visual Inertial Odometry, visual inertial odometer) algorithm to determine the self-positioning pose, that is to say, the input data of the VIO algorithm is the target image and motion data, and the output data of the VIO algorithm is the self-positioning pose.
  • the VIO algorithm can obtain the self-localization pose.
  • the VIO algorithm is used to perform steps 301 to 306 to obtain the self-localization pose.
  • the VIO algorithm can include but not limited to VINS (Visual Inertial Navigation Systems, visual inertial navigation system), SVO (Semi-direct Visual Odometry, semi-direct visual odometer), MSCKF (Multi State Constraint Kalman Filter, Kalman under multi-state constraints Filter), etc., are not limited here, as long as the self-localization pose can be obtained.
  • VINS Visual Inertial Navigation Systems, visual inertial navigation system
  • SVO Semi-direct Visual Odometry, semi-direct visual odometer
  • MSCKF Multi State Constraint Kalman Filter, Kalman under multi-state constraints Filter
  • Step 307 Generate a self-positioning trajectory of the terminal device in the self-positioning coordinate system based on the self-positioning pose corresponding to each frame of the multi-frame images, and the self-positioning trajectory includes multiple self-positioning poses in the self-positioning coordinate system.
  • the terminal device can obtain the self-localization trajectory in the self-localization coordinate system, and the self-localization trajectory may include the self-localization pose corresponding to each frame of multiple images.
  • the terminal device can obtain the self-localization poses corresponding to these images, that is, the self-localization trajectory can include a large number of self-localization poses, that is, the terminal device can obtain high frame rate self-localization poses. positioning track.
  • the terminal device may select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server. For example, the terminal device sends the self-positioning trajectory and the image to be tested to the server through a wireless network (such as 4G, 5G, Wifi, etc.). Since the frame rate of the image to be tested is low, the network bandwidth occupied is small.
  • a wireless network such as 4G, 5G, Wifi, etc.
  • a 3D visual map of the target scene It is necessary to pre-build a 3D visual map of the target scene and store the 3D visual map in the server, so that the server can perform global positioning based on the 3D visual map.
  • the 3D visual map is a storage method for the image information of the target scene, which can collect multiple frames of sample images of the target scene, and build a 3D visual map based on these sample images. For example, based on the multi-frame sample image of the target scene, visual mapping algorithms such as SFM (Structure From Motion, motion recovery structure) or SLAM (Simultaneous Localization And Mapping, simultaneous positioning and mapping) can be used to construct a 3D vision of the target scene map, there is no limit to how it can be constructed.
  • SFM Structure From Motion, motion recovery structure
  • SLAM Simultaneous Localization And Mapping, simultaneous positioning and mapping
  • the three-dimensional visual map may include the following information:
  • Sample image pose The sample image is a representative image when constructing a 3D visual map, that is, a 3D visual map can be constructed based on the sample image, and the pose matrix of the sample image (which can be referred to as the sample image pose) can be stored in the 3D visual map.
  • the map ie the 3D visual map, may include sample image poses.
  • Sample global descriptor For each frame of sample image, the sample image can correspond to the image global descriptor, and the image global descriptor is recorded as the sample global descriptor.
  • the sample global descriptor is a high-dimensional vector to represent the sample image, and the sample The global descriptor is used to distinguish the image features of different sample images.
  • the bag-of-words vector corresponding to the sample image can be determined based on the trained dictionary model, and the bag-of-words vector is determined as the sample global descriptor corresponding to the sample image.
  • the bag of words (Bag of Words) method is a way to determine the global descriptor.
  • the word bag vector can be constructed, which is a kind of image similarity detection method.
  • Vector representation method the bag-of-words vector can be used as the sample global descriptor corresponding to the sample image.
  • a "dictionary” also known as a dictionary model.
  • a classification tree is obtained after training.
  • Each classification tree can represent a visual "words”, and these visual "words” form a dictionary model.
  • all the feature point descriptors in the sample image can be classified as "words", and the frequency of occurrence of all words can be counted, so that the frequency of each word in the dictionary can form a vector, which is The bag-of-words vector corresponding to the sample image.
  • the bag-of-words vector can be used to measure the similarity between two frames of images, and the bag-of-words vector is used as a sample global descriptor corresponding to the sample image.
  • the sample image can be input to the trained deep learning model to obtain the target vector corresponding to the sample image, and determine the target vector as the sample global descriptor corresponding to the sample image.
  • the deep learning method is a way to determine the global descriptor.
  • the sample image can be multi-layered through the deep learning model, and finally a high-dimensional target vector is obtained.
  • the target vector is used as the sample global descriptor corresponding to the sample image.
  • the deep learning model such as the CNN (Convolutional Neural Networks, Convolutional Neural Network) model, etc.
  • the sample image can be input to the deep learning model, and the deep learning model processes the sample image to obtain a high-dimensional target vector, which is used as the sample global descriptor corresponding to the sample image .
  • the sample local descriptor corresponding to the feature point of the sample image For each frame of the sample image, the sample image can include multiple feature points, and the feature points can include specific pixel positions in the sample image and are used to describe the position local
  • the two parts of information of the descriptor of the range, that is, the feature point can correspond to an image local descriptor, and the image local descriptor is recorded as the sample local descriptor, and the sample local descriptor uses a vector to describe the feature point (that is, the pixel point The feature of the image block in the vicinity of the position), this vector can also be called the descriptor of the feature point.
  • the sample local descriptor is a feature vector used to represent the image block where the feature point is located, and the image block can be located in the sample image. It should be noted that, for a feature point in the sample image (ie, a two-dimensional feature point), the feature point can correspond to a map point in a three-dimensional visual map (ie, a three-dimensional map point). Therefore, the sample local descriptor corresponding to the feature point, It may also be a sample local descriptor corresponding to the map point corresponding to the feature point.
  • ORB Oriented FAST and Rotated BRIEF, oriented fast rotation
  • SIFT Scale-Invariant Feature Transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • ORB Oriented FAST and Rotated BRIEF, oriented fast rotation
  • SIFT Scale-Invariant Feature Transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • deep learning algorithms such as SuperPoint, DELF, D2-Net, etc.
  • Map point information may include, but not limited to: the 3D spatial position of the map point, all observed sample images, and the corresponding 2D feature points (that is, the feature points corresponding to the map point) numbers.
  • the server Based on the acquired 3D visual map of the target scene, after the server obtains the image to be tested, it determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the location of the terminal device on the 3D visual map based on the target map point.
  • the global positioning track in .
  • the server can determine the global positioning pose corresponding to the image to be tested. Assuming that there are M frames of the image to be tested, the M frames of the image to be tested correspond to M global positioning poses, and the terminal device in the three-dimensional visual map
  • the global positioning trajectory of can include M global positioning poses, which can be understood as the global positioning trajectory is a collection of M global positioning poses. For the first frame of the image to be tested in the M frames of images to be tested, determine the global positioning pose corresponding to the first frame of the image to be tested, and for the second frame of the image to be tested, determine the global positioning pose corresponding to the second frame of the image to be tested , and so on.
  • the global positioning pose is a pose point in the 3D visual map, that is, a pose point in the 3D visual map coordinate system.
  • these global positioning poses are composed into a global positioning track in the 3D visual map, and the global positioning track includes these global positioning poses.
  • the server may determine the global positioning track of the terminal device in the 3D visual map by using the following steps:
  • Step 401 the server acquires the image to be tested of the target scene from the terminal device.
  • the terminal device may acquire a target image, and the target image includes multiple frames of images, the terminal device may select M frames of images from the multiple frames of images as images to be tested, and send the M frames of images to be tested to the server.
  • the multi-frame images include key images and non-key images.
  • the terminal device may use the key images in the multi-frame images as the images to be tested, while the non-key images are not used as the images to be tested.
  • the terminal device can select the image to be tested from multiple frames of images at a fixed interval, assuming that the fixed interval is 5 (of course, the fixed interval can be arbitrarily configured according to experience, and there is no limit to this), then the first frame of image can be As the image to be tested, the image of the 6th (1+5) frame is used as the image to be tested, the image of the 11th (6+5) frame is used as the image to be tested, and so on, and one frame is selected for every 5 frames of images to be tested image.
  • Step 402 for each frame of the image to be tested, determine the global descriptor to be tested corresponding to the image to be tested.
  • the image to be tested may correspond to a global descriptor of the image, and the global descriptor of the image may be recorded as a global descriptor to be tested, and the global descriptor to be tested is represented by a high-dimensional vector
  • the image to be tested, the global descriptor to be tested is used to distinguish the image features of different images to be tested.
  • the bag of words vector corresponding to the image to be tested is determined based on the trained dictionary model, and the bag of words vector is determined as the global descriptor to be tested corresponding to the image to be tested.
  • input the image to be tested to the trained deep learning model to obtain the target vector corresponding to the image to be tested, and determine the target vector as the global descriptor to be tested corresponding to the image to be tested .
  • the global descriptor to be tested corresponding to the image to be tested can be determined based on the bag of visual words method or the deep learning method.
  • the determination method refer to the determination method of the global descriptor of the sample, and will not be repeated here.
  • Step 403 For each frame of the image to be tested, determine the similarity between the global descriptor to be tested corresponding to the image to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map.
  • the three-dimensional visual map can include the sample global descriptor corresponding to each frame of sample image, therefore, the similarity between the global descriptor to be tested and each sample global descriptor can be determined, and the similarity is "distance "Similarity" as an example, the distance between the global descriptor to be tested and each sample global descriptor can be determined, such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
  • Step 404 Based on the distance between the global descriptor to be tested and the global descriptor of each sample, select candidate sample images from the multi-frame sample images corresponding to the 3D visual map; wherein, the global descriptor to be tested and the candidate sample The distance between the sample global descriptors corresponding to the image is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor; and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image The distance between global descriptors is less than the distance threshold.
  • the distance 1 between the global descriptor to be tested and the sample global descriptor corresponding to sample image 1 can be calculated, and the The distance 2 between the global descriptor and the sample global descriptor corresponding to the sample image 2, and the distance 3 between the global descriptor to be tested and the sample global descriptor corresponding to the sample image 3 is calculated.
  • the sample image 1 is selected as the candidate sample image.
  • distance 1 is less than the distance threshold (can be configured based on experience), and distance 2 is less than the distance threshold, but distance 3 is not less than the distance threshold, then both sample image 1 and sample image 2 are selected as candidate sample images.
  • the sample image 1 is selected as the candidate sample image, but if the distance 1 is the minimum distance and the distance 1 is not less than the distance threshold, the candidate sample cannot be selected image, i.e. relocation failed.
  • a candidate sample image corresponding to the image to be tested may be selected from multiple frames of sample images corresponding to the three-dimensional visual map, and the number of candidate sample images is at least one.
  • Step 405. For each frame of the image to be tested, obtain a plurality of feature points from the image to be tested, and for each feature point, determine a local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used to represent the The feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested.
  • the image to be tested may include a plurality of feature points, and the feature points may be specific pixel positions in the image to be tested.
  • the feature point may correspond to an image local descriptor, which is recorded as the local descriptor to be tested.
  • the local descriptor to be tested uses a vector to describe the feature of the image block in the vicinity of the feature point (that is, the pixel point position), and the vector can also be called the descriptor of the feature point.
  • the local descriptor to be tested is a feature vector used to represent the image block where the feature point is located.
  • ORB ORB, SIFT, SURF and other algorithms can be used to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. It is also possible to use deep learning algorithms (such as SuperPoint, DELF, D2-Net, etc.) to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. There is no limit to this, as long as the feature points can be obtained , and determine the local descriptor to be tested.
  • deep learning algorithms such as SuperPoint, DELF, D2-Net, etc.
  • Step 406 for each feature point corresponding to the image to be tested, determine the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image corresponding to the image to be tested (i.e. candidate).
  • the map point corresponding to each feature point in the sample image corresponds to the distance between the sample local descriptors), such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
  • the 3D visual map includes sample local descriptors corresponding to each map point corresponding to the sample image, therefore, after obtaining the candidate sample image corresponding to the image to be tested, from the 3D visual map Obtain the sample local descriptor corresponding to each map point corresponding to the candidate sample image. After each feature point corresponding to the image to be tested is obtained, the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image is determined.
  • Step 407 for each feature point, based on the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image, from multiple maps corresponding to the candidate sample image
  • the target map point is selected from the points; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor , and/or, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is smaller than a distance threshold.
  • the distance 1 between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to map point 1 can be calculated, and Calculate the distance 2 between the local descriptor to be tested and the sample local descriptor corresponding to map point 2, and calculate the distance 3 between the local descriptor to be tested and the sample local descriptor corresponding to map point 3.
  • map point 1 may be selected as the target map point.
  • map point 1 may be selected as the target map point.
  • map point 1 can be selected as the target map point; however, if distance 1 is the minimum distance and distance 1 is not less than the distance threshold, the target cannot be selected map point, i.e. relocation failed.
  • the target map point corresponding to the feature point is selected from the candidate sample images corresponding to the image to be tested, and the matching relationship between the feature point and the target map point is obtained.
  • Step 408 based on the multiple feature points corresponding to the image to be tested and the target map points corresponding to the multiple feature points, determine the global positioning pose in the 3D visual map corresponding to the image to be tested.
  • the image to be tested may correspond to multiple feature points, and each feature point corresponds to a target map point.
  • the target map point corresponding to feature point 1 is map point 1
  • the target map point corresponding to feature point 2 is map point 2, and so on, so as to obtain multiple matching relationship pairs.
  • Each matching relationship pair includes a feature point (that is, a two-dimensional feature point) and a map point (that is, a three-dimensional map point in a three-dimensional visual map), the feature point represents the two-dimensional position in the image to be tested, and the map point represents the three-dimensional
  • the three-dimensional position in the visual map, that is, the matching relation pair includes the mapping relationship from the two-dimensional position to the three-dimensional position, that is, the mapping relationship from the two-dimensional position in the image to be tested to the three-dimensional position in the three-dimensional visual map.
  • the global positioning pose in the three-dimensional visual map corresponding to the image to be tested cannot be determined based on the multiple matching relationship pairs. If the total number of multiple matching relationship pairs reaches the quantity requirement (that is, the total number reaches a preset number value), it means that the global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs, The global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs.
  • the PnP (Perspective n Point, n-point perspective) algorithm can be used to calculate the global positioning pose in the three-dimensional visual map corresponding to the image to be tested, and the calculation method is not limited.
  • the input data of the PnP algorithm is a plurality of matching relationship pairs.
  • the matching relationship pair includes the two-dimensional position in the image to be tested and the three-dimensional position in the three-dimensional visual map.
  • the PnP algorithm can be used to calculate the pose of the image to be tested in the three-dimensional visual map, that is, the global positioning pose.
  • the global positioning pose in the 3D visual map corresponding to the image to be tested is obtained, that is, the global positioning pose corresponding to the image to be tested in the 3D visual map coordinate system is obtained.
  • a valid matching relationship pair may also be found from the multiple matching relationship pairs.
  • the PnP algorithm can be used to calculate the global positioning pose in the 3D visual map corresponding to the image to be tested.
  • the RANSAC Random SAmple Consensus, random sample consistency
  • the RANSAC Random SAmple Consensus, random sample consistency
  • Step 409 Generate a global positioning track of the terminal device in the 3D visual map based on the global positioning poses corresponding to the M frames of images to be tested acquired in step 401 , the global positioning track includes multiple global positioning poses in the 3D visual map. So far, the server can obtain the global positioning trajectory in the 3D visual map, that is, the global positioning trajectory in the coordinate system of the 3D visual map.
  • the global positioning trajectory can include the global positioning poses corresponding to the M frames of images to be tested, that is, the global positioning trajectory May include M global positioning poses. Since the M frames of images to be tested are partial images selected from all images, the global positioning track may include global positioning poses corresponding to a small number of images, that is, the server can obtain a global positioning track with a low frame rate.
  • the server After the server obtains the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory, it fuses the high frame rate self-positioning trajectory with the low frame rate global positioning trajectory to obtain the high frame rate trajectory in the 3D visual map coordinate system
  • the fused positioning trajectory that is, the fused positioning trajectory of the terminal device in the 3D visual map.
  • the fusion positioning trajectory is a high frame rate pose in the 3D visual map
  • the global positioning trajectory is a low frame rate pose in the 3D visual map, that is, the frame rate of the fusion positioning trajectory is higher than the frame rate of the global positioning trajectory, and the fusion positioning pose
  • the number is greater than the number of global positioning poses.
  • the white solid circle represents a self-localization pose
  • a trajectory composed of multiple self-localization poses is called a self-localization trajectory, that is, a self-localization trajectory includes multiple self-localization poses.
  • the self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system SL (that is, the self-localization coordinate system), and the self-localization pose corresponding to the first frame image is recorded as self-positioning pose It coincides with the coordinate origin of the reference coordinate system S L.
  • S L For each self-localization pose in the self-localization trajectory, is the self-localization pose in the reference frame S L .
  • the gray solid line circle represents the global positioning pose.
  • the trajectory composed of multiple global positioning poses is called the global positioning trajectory, that is, the global positioning trajectory includes multiple global positioning poses.
  • the global positioning pose can be the three-dimensional visual map coordinate system S
  • the pose under G that is, each global positioning pose in the global positioning trajectory is the global positioning pose under the 3D visual map coordinate system S G , that is, the global positioning pose under the 3D visual map.
  • the white dotted circle represents the fusion positioning pose.
  • the trajectory composed of multiple fusion positioning poses is called the fusion positioning trajectory, that is, the fusion positioning trajectory includes multiple fusion positioning poses.
  • the fusion positioning pose can be the three-dimensional visual map coordinate system S G
  • the pose under that is, each fusion positioning pose in the fusion positioning trajectory is the fusion positioning pose under the three-dimensional visual map coordinate system S G , that is, the fusion positioning pose under the three-dimensional visual map.
  • each frame of image corresponds to a self-positioning pose
  • a part of frame images is selected from the multiple frames of images as the image to be tested, and each frame of the image to be tested corresponds to a global positioning pose , so the number of self-localization poses is larger than the number of global localization poses.
  • each self-localization pose corresponds to a fusion positioning pose (that is, the self-localization pose and the fusion positioning pose correspond one-to-one), that is, the number of self-localization poses
  • the number of fused localization poses is the same as that of fused localization poses, therefore, the number of fused localization poses is also larger than the number of global localization poses.
  • the server can implement the trajectory fusion function and the pose transformation function.
  • the server can implement the trajectory fusion function and the pose transformation function through the following steps to obtain the Fusion positioning track in the map:
  • Step 601 Select N self-localization poses corresponding to the target time period from all self-localization poses included in the self-localization trajectory, and select N self-localization poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory
  • the P global positioning poses for example, N may be greater than P.
  • N self-localization poses corresponding to the target time period that is, self-localization poses determined based on images collected during the target time period
  • P global positioning poses corresponding to the target time period that is, the global positioning pose determined based on the images collected in the target time period
  • Step 602 Determine N fused positioning poses corresponding to the N self-localization poses and P global positioning poses based on the N self-localization poses and the P global positioning poses, and the N self-localization poses correspond to the N fused positioning poses one-to-one.
  • the self-positioning pose can be determined based on N self-positioning poses and P global positioning poses Corresponding fusion positioning pose Determining the self-localization pose Corresponding fusion positioning pose Determining the self-localization pose Corresponding fusion positioning pose and so on.
  • N self-localization poses there are N self-localization poses, P global localization poses and N fusion localization poses, the N self-localization poses are all known values, and the P global localization poses All are known values, and the N fusion positioning poses are all unknown values, which are the pose values that need to be solved.
  • the self-positioning pose and fusion localization pose Corresponding, self-positioning pose and fusion localization pose Corresponding, self-positioning pose and fusion localization pose correspond, and so on.
  • global positioning pose and fusion localization pose Correspondence, global positioning pose and fusion localization pose correspond, and so on.
  • the first constraint value can be determined based on the N self-positioning poses and N fusion positioning poses, and the first constraint value is used to represent the residual value between the fusion positioning pose and the self-localization pose, such as based on and difference, and difference, ..., and Calculate the first constraint value.
  • the calculation formula of the first constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
  • the second constraint value can be determined based on P global positioning poses and P fusion positioning poses (that is, P fusion positioning poses corresponding to P global positioning poses are selected from N fusion positioning poses).
  • the two constraint values are used to represent the residual value (which can be an absolute difference) between the fusion positioning pose and the global positioning pose, such as can be based on and difference, ..., and Calculate the second constraint value.
  • the calculation formula of the second constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
  • the target constraint value may be calculated based on the first constraint value and the second constraint value, for example, the target constraint value may be the sum of the first constraint value and the second constraint value. Since the N self-positioning poses and P global positioning poses are all known values, and the N fusion positioning poses are all unknown values, therefore, by adjusting the values of the N fusion positioning poses, the target constraint value is minimum. When the target constraint value is the minimum, the values of the N fusion positioning poses are the final solution pose values, so far, the values of the N fusion positioning poses are obtained.
  • formula (1) can be used to calculate the target constraint value:
  • F(T) represents the target constraint value
  • the part before the plus sign (subsequently recorded as the first part) is the first constraint value
  • the part after the plus sign (subsequently recorded as the second part) is the second Constraint values
  • ⁇ i, i+1 is the residual information matrix for the self-localization pose, which can be configured according to experience, and there is no restriction on this.
  • ⁇ k is the residual information matrix for the global positioning pose, which can be configured according to experience. No restrictions.
  • the first part represents the relative transformation constraint between the self-localization pose and the fused localization pose, which can be reflected by the first constraint value.
  • N is the number of all self-localization poses in the self-localization trajectory, that is, N self-localization poses.
  • the second part represents the global positioning constraints of the global positioning pose and the fusion positioning pose, which can be reflected by the second constraint value.
  • P is the number of all global positioning poses in the global positioning trajectory, that is, P global positioning poses.
  • Positioning poses for fusion (with corresponding global positioning poses ), for The corresponding global positioning pose, e k represents the fusion positioning pose relative to the global localization pose residuals.
  • the optimization goal can be to minimize the value of F(T), so that the fusion positioning pose can be obtained, that is, the 3D visual map coordinate system
  • the following fusion positioning trajectory can be referred to formula (4): arg min F(T) By minimizing the value of F(T), the fusion positioning trajectory can be obtained, and the fusion positioning trajectory can include multiple fusion positioning poses .
  • Step 603 Generate a fused positioning trajectory of the terminal device in the 3D visual map based on the N fused positioning poses, where the fused positioning trajectory includes the N fused positioning poses in the 3D visual map.
  • the server obtains the fused positioning trajectory in the 3D visual map, that is, the fused positioning trajectory in the 3D visual map coordinate system, the number of fused positioning poses in the fused positioning trajectory is greater than the number of global positioning poses in the global positioning trajectory, That is to say, a fusion positioning track with a high frame rate can be obtained.
  • Step 604 Select an initial fusion positioning pose from the fusion positioning trajectory, and select an initial self-localization pose corresponding to the initial fusion positioning pose from the self-localization trajectory.
  • Step 605 Select a target self-localization pose from the self-localization track, and determine a target fusion positioning pose based on the initial fusion positioning pose, the initial self-localization pose, and the target self-localization pose.
  • the fusion positioning trajectory can also be updated.
  • the initial fusion positioning pose can be selected from the fusion positioning trajectory, and the self-positioning trajectory can be The initial self-localization pose is selected, and the target self-localization pose is selected from the self-localization track.
  • the target fusion positioning pose can be determined based on the initial fusion positioning pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory may be generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
  • the self-positioning trajectory includes and Between the self-localization poses
  • the global localization trajectory includes and Between the global positioning poses
  • the fusion positioning trajectory includes and Between the fusion positioning pose, after that, if a new self-positioning pose is obtained
  • the following formula (4) can also be used to determine the fusion positioning pose
  • formula (4) represents the self-localization pose
  • the corresponding fusion positioning pose that is, the target fusion positioning pose
  • Indicates the fusion positioning pose that is, the initial fusion positioning pose selected from the fusion positioning trajectory
  • Indicates the self-localization pose which is selected from the self-localization trajectory
  • the corresponding initial self-localization pose Indicates the self-localization pose, that is, the target self-localization pose selected from the self-localization trajectory.
  • the pose can be positioned based on the initial fusion
  • the initial self-localization pose and the target self-localization pose Determining the target fusion positioning pose
  • a new fusion positioning trajectory can be generated, that is, the new fusion positioning trajectory can include the target fusion positioning pose In this way, the fusion positioning trajectory is updated.
  • step 601-step 603 is the trajectory fusion process
  • step 604-step 605 is the pose transformation process
  • trajectory fusion is the process of registering and fusing the self-positioning trajectory and the global positioning trajectory, so as to realize the self-positioning trajectory from the self-positioning trajectory
  • the transformation from the positioning coordinate system to the 3D visual map coordinate system uses the global positioning results to correct the trajectory.
  • a trajectory fusion is performed. Since not all frames can successfully obtain global positioning trajectories, the poses corresponding to these frames are output to the fused positioning pose of the 3D visual map coordinate system through pose transformation, that is, the pose transformation process.
  • the 3D visualization map of the target scene It is necessary to pre-build a 3D visualization map of the target scene and store the 3D visualization map in the server, and the server can display the trajectory based on the 3D visualization map.
  • the 3D visualization map is a 3D visualization map of the target scene, which is mainly used for trajectory display and can be obtained through laser scanning and manual modeling.
  • the 3D visualized map is a viewable visualized map, for example, it can be obtained by using a composition algorithm, and this application does not limit the construction method of the 3D visualized map.
  • the 3D visual map and the 3D visual map Based on the 3D visual map of the target scene and the 3D visual map of the target scene, it is necessary to register the 3D visual map and the 3D visual map to ensure that the 3D visual map and the 3D visual map are aligned in space. For example, sampling the 3D visual map, changing the 3D visual map from a triangular patch form to a dense point cloud form, and using the point cloud and the 3D point cloud of the 3D visual map to pass ICP (Iterative Closest Point, Iterative Closest Point) The algorithm performs registration to obtain the transformation matrix T from the 3D visual map to the 3D visual map; finally, the transformation matrix T is used to transform the 3D visual map into the 3D visual map coordinate system, and a 3D visual map aligned with the 3D visual map is obtained.
  • ICP Iterative Closest Point, Iterative Closest Point
  • the transformation matrix T (referred to as the target transformation matrix) can be determined in the following manner:
  • Method 1 When constructing a 3D visual map and a 3D visualization map, multiple calibration points can be deployed in the target scene (different calibration points can be distinguished by different shapes, so that the calibration points can be identified from the image), and the 3D visual map can include multiple A calibrated point, the 3D visualization map can also include multiple calibrated points. For each of the multiple calibration points, a coordinate pair corresponding to the calibration point can be determined, and the coordinate pair includes the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map. The target transformation matrix can be determined based on the coordinate pairs corresponding to the multiple calibration points.
  • Method 2 Obtain an initial transformation matrix, map the position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determine the initial transformation matrix based on the relationship between the mapped coordinates and the actual coordinates in the 3D visual map Whether it has converged; if so, the initial transformation matrix can be determined as the target transformation matrix, that is, the target transformation matrix is obtained; if not, the initial transformation matrix can be adjusted, and the adjusted transformation matrix can be used as the initial transformation matrix, and then , returning to perform the operation of mapping the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map based on the initial transformation matrix, and so on until the target transformation matrix is obtained.
  • an initial transformation matrix can be obtained first, and there is no restriction on the method of obtaining the initial transformation matrix. It can be an initial transformation matrix set randomly, or an initial transformation matrix obtained by a certain algorithm. This initial transformation matrix is required
  • the iteratively optimized matrix that is, iteratively optimizes the initial transformation matrix continuously, and uses the iteratively optimized initial transformation matrix as the target transformation matrix.
  • the position coordinates in the 3D visual map can be mapped to the mapping coordinates in the 3D visual map based on the initial transformation matrix.
  • the mapping coordinates in the 3D visual map are transformed coordinates based on the initial transformation matrix
  • the actual coordinates in the 3D visual map are the real coordinates in the 3D visual map, that is, the position coordinates in the 3D visual map correspond to
  • the difference between the mapped coordinates and the actual coordinates is smaller, it means that the accuracy of the initial transformation matrix is higher.
  • the difference between the mapped coordinates and the actual coordinates is larger, it means that the accuracy of the initial transformation matrix is worse.
  • it can be determined based on the difference between the mapped coordinates and the actual coordinates whether the initial transformation matrix has converged.
  • the mapped coordinates and the actual coordinates can be the sum of multiple sets of differences, and each set of differences corresponds to a difference between the mapped coordinates and the actual coordinates
  • the threshold it is determined that the initial transformation matrix has converged. If the mapped coordinates If the difference from the actual coordinates is not less than the threshold, it is determined that the initial transformation matrix has not converged.
  • the initial transformation matrix can be adjusted, and there is no restriction on the adjustment process.
  • the ICP algorithm is used to adjust the initial transformation matrix, and the adjusted transformation matrix is used as the initial transformation matrix, and the return execution is based on the initial
  • the transformation matrix maps the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map, and so on until the target transformation matrix is obtained. If the initial transformation matrix has converged, the initial transformation matrix is determined as the target transformation matrix.
  • Mode 3 Sampling the 3D visual map to obtain a first point cloud corresponding to the 3D visual map; sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map.
  • the ICP algorithm is used to register the first point cloud and the second point cloud, and the target transformation matrix between the 3D visual map and the 3D visual map is obtained.
  • the first point cloud and the second point cloud can be obtained, the first point cloud includes a large number of 3D points, the second point cloud includes a large number of 3D points, based on a large number of 3D points of the first point cloud and a large number of 3D points of the second point cloud point, the ICP algorithm can be used for registration, and the registration process is not limited.
  • the server can convert the fusion positioning pose into a 3D visualization map based on the target transformation matrix between the 3D visual map and the 3D visualization map Target positioning pose, and display the target positioning pose through a three-dimensional visualization map.
  • the manager can open a web browser and access the server through the network to view the target positioning poses displayed in the three-dimensional visualization map, and these target positioning poses form a trajectory.
  • the server can display the target positioning pose of the terminal device on the three-dimensional visual map, so that managers can view the target positioning pose displayed on the three-dimensional visual map.
  • Managers can change the viewing angle by dragging the mouse to realize 3D viewing of the track.
  • the server includes client software, and the client software reads and renders the 3D visualization map, and displays the target positioning pose on the 3D visualization map.
  • a user such as a manager
  • the viewing angle of the three-dimensional visualization map can be changed by dragging the mouse.
  • a positioning and display method combining cloud and edge is proposed, and the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing network transmission amount of data.
  • Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices.
  • the system architecture of cloud-edge integration can share the computing pressure, reduce the hardware cost of terminal equipment, and reduce the amount of network transmission data.
  • the final positioning result can be displayed on a 3D visualization map, and the management personnel access the server through the web terminal for interactive display.
  • an embodiment of the present application proposes a cloud edge management system, the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of a target scene.
  • the terminal device is used to acquire a target image of the target scene and motion data of the terminal device during the process of moving in the target scene, and determine a self-positioning trajectory of the terminal device based on the target image and the motion data ; If the target image includes multiple frames of images, select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server.
  • the server is configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory includes a plurality of fusion positioning poses; for the Fusing each fused positioning pose in the fused positioning trajectory, determining a target positioning pose corresponding to the fused positioning pose, and displaying the target positioning pose.
  • the terminal device includes a visual sensor and a motion sensor; wherein, the visual sensor is used to obtain the target image of the target scene, and the motion sensor is used to obtain the motion data of the terminal device .
  • the terminal device is a wearable device, and the visual sensor and the motion sensor are deployed on the wearable device; or, the terminal device is a recorder, and the visual sensor and the motion sensor are deployed on the on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are deployed on the camera.
  • the server when the server generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine from the three-dimensional visual map The target map point corresponding to the image to be tested, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal based on the self-positioning track and the global positioning track The fused positioning trajectory of the device in the three-dimensional visual map; wherein, the frame rate of the fused positioning pose included in the fused positioning trajectory is greater than the frame rate of the global positioning pose included in the global positioning trajectory; the fused positioning trajectory The frame rate of the included fused localization pose is equal to the frame rate of the self-localization pose included in the self-localization trajectory.
  • the server determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose is specifically used to: based on the target transformation matrix between the 3D visual map and the 3D visual map , converting the fusion positioning pose into the target positioning pose in the 3D visualization map, and displaying the target positioning pose through the 3D visualization map;
  • the server includes client software, and the client The terminal software reads and renders the three-dimensional visualization map, and displays the target positioning pose on the three-dimensional visualization map; wherein, the user accesses the client software through a Web browser to pass the client software View the target positioning pose displayed in the three-dimensional visual map; wherein, when viewing the target positioning pose displayed in the three-dimensional visual map through the client software, drag the mouse to change the three-dimensional The viewing angle of the visualized map.
  • a pose display device is proposed in the embodiment of the present application, which is applied to the server in the cloud edge management system, and the server includes a three-dimensional visual map of the target scene, as shown in FIG. 7 , which is Structural diagram of the pose display device.
  • the pose display device includes: an acquisition module 71, configured to acquire an image to be tested and a self-positioning trajectory; wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device , the image to be tested is a partial frame image in the multi-frame images included in the target image; a generation module 72, configured to generate the terminal device in the three-dimensional vision based on the image to be tested and the self-positioning trajectory
  • the fusion positioning track in the map, the fusion positioning track includes a plurality of fusion positioning poses; the display module 73 is used to determine the fusion positioning pose corresponding to each fusion positioning pose in the fusion positioning track. target location pose, and display the target location pose.
  • the generation module 72 when the generation module 72 generates the fused positioning trajectory of the terminal device in the 3D visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine the target corresponding to the image to be tested from the 3D visual map map point, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track
  • the fusion positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is greater than the frame rate of the global positioning pose included in the global positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is equal to the self-positioning
  • the frame rate of the self-localization pose included in the trajectory when the generation module 72 generates the fused positioning trajectory of the terminal device in the 3D visual map based on the image to be tested and the self-positioning trajectory.
  • the three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to a feature point in the sample image, and map point information;
  • the generation module 72 determines the target map point corresponding to the image to be tested from the three-dimensional visual map, and when determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point, it is specifically used to: for each frame to be tested Based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, candidate sample images are selected from the multi-frame sample images; multiple feature points are obtained from the image to be tested; for each feature point, determine the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image; determine the image to be tested based on the multiple feature points and the target map points corresponding to the multiple feature points The corresponding global positioning pose in the three-dimensional visual map; generating
  • the generation module 72 is based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, when selecting candidate sample images from the multi-frame sample images, it is specifically used to : Determine the global descriptor to be tested corresponding to the image to be tested, and determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of the sample image corresponding to the three-dimensional visual map; Measure the distance between the global descriptor and each sample global descriptor, and select the candidate sample image from the multi-frame sample images; wherein, the sample corresponding to the global descriptor to be tested and the candidate sample image
  • the distance between the global descriptors is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor, and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image
  • the distance between global descriptors is less than the distance threshold.
  • the generation module 72 determines the global descriptor to be tested corresponding to the image to be tested, it is specifically used to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and convert the bag-of-words vector to The vector is determined as the global descriptor to be tested corresponding to the image to be tested; or, the image to be tested is input to a trained deep learning model to obtain a target vector corresponding to the image to be tested, and the target The vector is determined as the global descriptor to be tested corresponding to the image to be tested.
  • the generation module 72 determines the target map point corresponding to the feature point from the plurality of map points corresponding to the candidate sample image, it is specifically used to: determine the local descriptor to be measured corresponding to the feature point, The local descriptor to be tested is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the image to be tested; determine the local descriptor to be tested corresponding to the candidate sample image The distance between the sample local descriptors corresponding to each map point; based on the distance between the local descriptor to be tested and each sample local descriptor, select the selected map point from the plurality of map points corresponding to the candidate sample image The target map point; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is that the local descriptor to be tested corresponds to each map point corresponding to the candidate sample image The minimum distance among the distances between the sample local descriptors, and/or, the distance between the local descriptor to
  • the generation module 72 when the generation module 72 generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory, it is specifically used to: select from all self-positioning poses included in the self-positioning trajectory N self-positioning poses corresponding to the target time period, and selecting P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; N is greater than P; based on The N self-positioning poses and the P global positioning poses determine N fusion positioning poses corresponding to the N self-positioning poses, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses; A fused positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the N fused positioning poses.
  • the display module 73 determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose, is specifically used for: based on the target between the 3D visual map and the 3D visual map transformation matrix, converting the fused positioning pose into the target positioning pose in the three-dimensional visualization map, and displaying the target positioning pose through the three-dimensional visualization map; wherein, the display module 73 is also used to adopt The target transformation matrix between the three-dimensional visual map and the three-dimensional visual map is determined in the following manner: for each of the multiple marked points in the target scene, a coordinate pair corresponding to the marked point is determined, and the coordinate pair includes the The position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; determine the target transformation matrix based on the coordinate pairs corresponding to the plurality of calibration points; or, obtain the initial transformation matrix, mapping position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determining the initial transformation matrix
  • the server may include: a processor and a machine-readable storage medium, and the machine-readable storage medium stores information that can be executed by the processor. machine-executable instructions; the processor is configured to execute the machine-executable instructions to implement the pose display method disclosed in the above examples of the present application.
  • the embodiment of the present application also provides a machine-readable storage medium, on which several computer instructions are stored, and when the computer instructions are executed by a processor, the present invention can be realized. Apply the pose display method disclosed in the above example.
  • the above-mentioned machine-readable storage medium may be any electronic, magnetic, optical or other physical storage device, which may contain or store information, such as executable instructions, data, and so on.
  • the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid state drive, any type of storage disk (such as CD, DVD, etc.), or similar storage media, or a combination of them.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment
  • the instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Provided are a pose display method and apparatus, and a system. The method comprises: during the process of a terminal device moving in a target scene, acquiring a target image of the target scene and motion data of the terminal device, and determining a self-positioning trajectory on the basis of the target image and the motion data (101); if the target image comprises a plurality of frames of images, the terminal device selecting, from the plurality of frames of image, some images as images to be subjected to detection, and sending said images and the self-positioning trajectory to a server (102); the server generating, on the basis of said images and the self-positioning trajectory, a fused positioning trajectory of the terminal device in a three-dimensional visual map, wherein the fused positioning trajectory comprises a plurality of fused positioning poses (103); and for each fused positioning pose, the server determining a target positioning pose corresponding to the fused positioning pose, and displaying the target positioning pose (104). In this way, a positioning function with a high frame rate and high precision is realized, and a terminal device only sends a self-positioning trajectory and images to be subjected to detection, thereby reducing the amount of data transmitted by a network, and reducing the computing resource consumption and storage resource consumption of the terminal device.

Description

位姿显示方法、装置及***、服务器以及存储介质Pose display method, device and system, server and storage medium 技术领域technical field
本申请涉及计算机视觉领域,尤其涉及一种位姿显示方法、装置及***、服务器、以及机器可读存储介质。The present application relates to the field of computer vision, and in particular to a pose display method, device and system, a server, and a machine-readable storage medium.
背景技术Background technique
GPS(Global Positioning System,全球定位***)是一种高精度无线电导航定位***,GPS在全球任何地方以及近地空间均能够提供准确的地理位置、车行速度及精确的时间信息。北斗卫星导航***由空间段、地面段和用户段三部分组成,可在全球范围内全天候、全天时为用户提供高精度、高可靠定位、导航、授时服务,具备区域导航、定位和授时能力。GPS (Global Positioning System) is a high-precision radio navigation and positioning system. GPS can provide accurate geographic location, vehicle speed and precise time information anywhere in the world and in near-earth space. The Beidou satellite navigation system consists of three parts: the space segment, the ground segment and the user segment. It can provide users with high-precision, high-reliability positioning, navigation, and timing services around the clock and around the world, and has regional navigation, positioning, and timing capabilities. .
由于终端设备具有GPS或北斗卫星导航***,因此,在需要对终端设备进行定位时,可以采用GPS或北斗卫星导航***对终端设备进行定位。在室外环境下,由于GPS信号或北斗信号比较好,可以采用GPS或北斗卫星导航***对终端设备进行准确定位。但是,在室内环境下,由于GPS信号或者北斗信号比较差,导致GPS或北斗卫星导航***无法对终端设备进行准确定位。比如说,在煤炭、电力、石化等能源行业,对于定位的需求越来越多,这些定位需求一般是在室内环境,由于信号遮挡等问题,导致无法对终端设备进行准确定位。Since the terminal device has the GPS or the Beidou satellite navigation system, when the terminal device needs to be positioned, the GPS or the Beidou satellite navigation system can be used to locate the terminal device. In the outdoor environment, since the GPS signal or the Beidou signal is relatively good, the GPS or Beidou satellite navigation system can be used to accurately locate the terminal device. However, in an indoor environment, due to the poor GPS signal or Beidou signal, the GPS or Beidou satellite navigation system cannot accurately locate the terminal device. For example, in coal, electric power, petrochemical and other energy industries, there are more and more requirements for positioning. These positioning requirements are generally in indoor environments. Due to problems such as signal occlusion, it is impossible to accurately locate terminal devices.
发明内容Contents of the invention
本申请提供一种位姿显示方法,应用于云边管理***,所述云边管理***包括终端设备和服务器,服务器包括目标场景的三维视觉地图,所述方法包括:所述终端设备在目标场景中移动的过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分帧图像作为待测图像,将所述待测图像和所述自定位轨迹发送给所述服务器;所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;针对所述融合定位轨迹中的每个融合定位位姿,所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The present application provides a pose display method, which is applied to a cloud edge management system. The cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of the target scene. The method includes: the terminal device in the target scene In the process of moving in the center, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames image, select a part of the frame image from the multi-frame image as the image to be tested, and send the image to be tested and the self-positioning track to the server; the server is based on the image to be tested and the self-positioning track The positioning track generates a fusion positioning track of the terminal device in the three-dimensional visual map, the fusion positioning track includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning track, the server Determine the target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
本申请提供一种云边管理***,所述云边管理***包括终端设备和服务器,所述服务器包括目标场景的三维视觉地图,其中:所述终端设备用于在目标场景中移动的过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分帧图像作为待测图像,将所述待测图像和所述自定位轨迹发送给所述服务器;所述服务器用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The present application provides a cloud edge management system, the cloud edge management system includes a terminal device and a server, the server includes a three-dimensional visual map of the target scene, wherein: the terminal device is used to acquire The target image of the target scene and the motion data of the terminal device, determining the self-positioning trajectory of the terminal device based on the target image and the motion data; if the target image includes multiple frames of images, then from the Selecting a part of frame images from the multi-frame images as the image to be tested, and sending the image to be tested and the self-positioning track to the server; the server is used to generate the image to be tested and the self-positioning track based on the The fusion positioning trajectory of the terminal device in the three-dimensional visual map, the fusion positioning trajectory includes a plurality of fusion positioning poses; for each fusion positioning pose in the fusion positioning trajectory, determine the fusion positioning pose corresponding target positioning pose, and display the target positioning pose.
本申请提供一种位姿显示装置,应用于云边管理***中的服务器,所述服务器包括目标场景的三维视觉地图,所述装置包括:获取模块,用于获取待测图像和自定位轨迹;其中,所述自定位轨迹由终端设备基于所述目标场景的目标图像和所述终端设备的运动数据确定,所述待测图像是所述目标图像包括的多帧图像中的部分帧图像;生成模块,用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;显示模块,用于针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The present application provides a pose display device, which is applied to a server in a cloud edge management system, where the server includes a three-dimensional visual map of a target scene, and the device includes: an acquisition module for acquiring an image to be tested and a self-positioning trajectory; Wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device, and the image to be tested is a partial frame image in the multi-frame images included in the target image; generating A module, configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image-to-be-tested and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses; the display module uses For each fused positioning pose in the fused positioning trajectory, determine a target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
本申请提供一种服务器,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器用于执行所述机器可执行指令以实现根据本申请实施例所述的位姿显示方法。The present application provides a server, including a processor and a machine-readable storage medium, the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor is used to execute the machine-executable instructions to implement the pose display method according to the embodiment of the present application.
本申请提供一种机器可读存储介质,所述机器可读存储介质上存储有计算机指令,所述计算机指令被处理器执行时,能够实现根据本申请实施例所述的位姿显示方法。The present application provides a machine-readable storage medium. Computer instructions are stored on the machine-readable storage medium. When the computer instructions are executed by a processor, the pose display method according to the embodiment of the present application can be implemented.
由以上技术方案可见,本申请实施例中提出一种云边结合的定位及显示方法,通过边缘端的终端设备采集目标图像和运动数据,依据目标图像和运动数据进行高帧率的自定位,得到高帧率的自定位轨迹。云端的服务器接收终端设备发送的待测图像和自定位轨迹,依据待测图像和自定位轨迹得到高帧率的融合定位轨迹,即三维视觉地图中的高帧率的融合定位轨迹,实现高帧率和高精度的定位功能,实现高精度、低成本、易部署的室内定位功能,是一种基于视觉的室内定位方式,并能够显示融合定位轨迹。在上述方式中,由终端设备计算高帧率的自定位轨迹,仅发送自定位轨迹和 少量待测图像,减少网络传输的数据量。在服务器进行全局定位,从而减少终端设备的计算资源消耗和存储资源消耗。能够应用在煤炭、电力、石化等能源行业,实现人员(如工人、巡检人员等)的室内定位,快速获知人员的位置信息,保障人员安全。It can be seen from the above technical solutions that a cloud-edge combined positioning and display method is proposed in the embodiment of the present application. The target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data to obtain High frame rate self-positioning trajectories. The server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate It is a vision-based indoor positioning method and can display fusion positioning tracks. In the above method, the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network. Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
附图说明Description of drawings
图1是本申请一种实施方式中的位姿显示方法的流程示意图;FIG. 1 is a schematic flowchart of a pose display method in an embodiment of the present application;
图2是本申请一种实施方式中的云边管理***的结构示意图;FIG. 2 is a schematic structural diagram of a cloud edge management system in an embodiment of the present application;
图3是本申请一种实施方式中的确定自定位轨迹的流程示意图;FIG. 3 is a schematic flow diagram of determining a self-positioning trajectory in an embodiment of the present application;
图4是本申请一种实施方式中的确定全局定位轨迹的流程示意图;FIG. 4 is a schematic flow diagram of determining a global positioning track in an embodiment of the present application;
图5是自定位轨迹、全局定位轨迹和融合定位轨迹的示意图;Fig. 5 is a schematic diagram of a self-positioning trajectory, a global positioning trajectory and a fusion positioning trajectory;
图6是本申请一种实施方式中的确定融合定位轨迹的流程示意图;FIG. 6 is a schematic flow diagram of determining a fusion positioning trajectory in an embodiment of the present application;
图7是本申请一种实施方式中的位姿显示装置的结构示意图。Fig. 7 is a schematic structural diagram of a pose display device in an embodiment of the present application.
具体实施方式Detailed ways
在本申请实施例使用的术语仅仅是出于描述特定实施例的目的,而非限制本申请。本申请和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, rather than limiting the present application. As used in this application and the claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本申请实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although terms such as first, second, and third may be used in the embodiment of the present application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, furthermore, the use of the word "if" could be interpreted as "at" or "when" or "in response to a determination."
本申请实施例中提出一种位姿显示方法,该方法可以应用于云边管理***,该云边管理***可以包括终端设备(即边缘端的终端设备)和服务器(即云端的服务器),且服务器可以包括目标场景(如室内环境、室外环境等)的三维视觉地图。参见图1所示,为该位姿显示方法的流程示意图,该方法可以包括:In the embodiment of the present application, a pose display method is proposed, which can be applied to the cloud edge management system. The cloud edge management system can include a terminal device (that is, a terminal device at the edge end) and a server (that is, a server in the cloud), and the server A three-dimensional visual map of the target scene (such as indoor environment, outdoor environment, etc.) may be included. Referring to Fig. 1, it is a schematic flow chart of the pose display method, which may include:
步骤101、终端设备在目标场景中移动的过程中,获取目标场景的目标图像和终端设备的运动数据,并基于目标图像和运动数据确定终端设备的自定位轨迹。 Step 101. During the movement of the terminal device in the target scene, acquire the target image of the target scene and the motion data of the terminal device, and determine the self-positioning trajectory of the terminal device based on the target image and motion data.
示例性的,若目标图像包括多帧图像,则终端设备从多帧图像中遍历出当前帧图像;基于当前帧图像前面的K(K可以为正整数)帧图像的每帧图像对应的自定位位姿、终端设备在自定位坐标系中的地图位置(即,坐标位置)和该运动数据确定当前帧图像对应的自定位位姿(即,终端设备的自定位位姿);基于多帧图像的每帧图像对应的自定位位姿生成终端设备在自定位坐标系中的自定位轨迹。Exemplarily, if the target image includes multiple frames of images, the terminal device traverses the current frame of images from the multiple frames of images; based on the self-positioning Pose, the map position of the terminal device in the self-positioning coordinate system (that is, the coordinate position) and the motion data determine the self-positioning pose corresponding to the current frame image (that is, the self-positioning pose of the terminal device); based on multi-frame images The self-positioning pose corresponding to each frame of the image generates the self-positioning trajectory of the terminal device in the self-positioning coordinate system.
“位姿”包括位置和姿态,“自定位坐标系”为以所述多帧图像中的第一帧图像对应的自定位位姿为坐标原点所建立的坐标系。"Posture" includes position and posture, and "self-positioning coordinate system" is a coordinate system established with the self-positioning pose corresponding to the first frame image in the multi-frame images as the coordinate origin.
示例性的,若当前帧图像是关键图像,则可以基于终端设备的当前位置(即与当前帧图像对应的位置)生成终端设备在自定位坐标系中的地图位置。若当前帧图像是非关键图像,则不需要基于终端设备的当前位置生成终端设备在自定位坐标系中的地图位置。终端设备的当前位置例如为终端设备在采集当前帧图像时所处的实际的物理位置。Exemplarily, if the current frame image is a key image, the map position of the terminal device in the self-positioning coordinate system may be generated based on the current position of the terminal device (ie, the position corresponding to the current frame image). If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device. The current location of the terminal device is, for example, the actual physical location of the terminal device when it collects the current frame of image.
若当前帧图像与当前帧图像的前一帧图像之间的匹配特征点数量未达到预设阈值,则确定当前帧图像是关键图像。若当前帧图像与当前帧图像的前一帧图像之间的匹配特征点数量达到预设阈值,则确定当前帧图像是非关键图像。示例性的,第一帧图像可以作为关键图像。If the number of matching feature points between the current frame image and the previous frame image of the current frame image does not reach a preset threshold, it is determined that the current frame image is a key image. If the number of matching feature points between the current frame image and the previous frame image of the current frame image reaches a preset threshold, it is determined that the current frame image is a non-key image. Exemplarily, the first frame of image may be used as a key image.
步骤102、若目标图像包括多帧图像,则终端设备从多帧图像中选取部分帧图像作为待测图像,并将待测图像和自定位轨迹发送给服务器。Step 102: If the target image includes multiple frames of images, the terminal device selects a part of frame images from the multiple frames of images as the image to be tested, and sends the image to be tested and the self-positioning trajectory to the server.
比如说,终端设备可以从多帧图像中选取M帧图像作为待测图像,M可以为正整数,如1、2、3等。显然,终端设备发送给服务器的是多帧图像中的部分帧图像,从而可以减少网络传输的数据量,节省网络带宽资源。For example, the terminal device may select M frames of images from multiple frames of images as images to be tested, and M may be a positive integer, such as 1, 2, 3, and so on. Apparently, what the terminal device sends to the server is part of the multi-frame images, thereby reducing the amount of data transmitted over the network and saving network bandwidth resources.
步骤103、服务器基于该待测图像和该自定位轨迹生成终端设备在三维视觉地图中的融合定位轨迹,该融合定位轨迹可以包括多个融合定位位姿。 Step 103, the server generates a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory may include multiple fusion positioning poses.
示例性的,服务器可以从目标场景的三维视觉地图中确定出与该待测图像对应的目标地图点,并基于目标地图点确定终端设备在该三维视觉地图中的全局定位轨迹。然后,服务器基于该自定位轨迹和该全局定位轨迹生成终端设备在该三维视觉地图中的融合定位轨迹。示例性的,该融合定位轨迹包括的融合定位位姿的帧率可以大于该全局定位轨迹包括的全局定位位姿的帧率,即该融合定 位轨迹的帧率高于该全局定位轨迹的帧率。其中,位姿的帧率指位姿输出的频率,也就是每秒钟***输出的位姿数量。该融合定位轨迹可以是三维视觉地图中的高帧率位姿,该全局定位轨迹可以是三维视觉地图中的低帧率位姿。融合定位轨迹的帧率高于全局定位轨迹的帧率,表示融合定位位姿的数量大于全局定位位姿的数量。此外,该融合定位轨迹包括的融合定位位姿的帧率可以等于该自定位轨迹包括的自定位位姿的帧率,即该融合定位轨迹的帧率等于自定位轨迹的帧率,即自定位轨迹可以是高帧率位姿。融合定位轨迹的帧率等于自定位轨迹的帧率,表示融合定位位姿的数量等于自定位位姿的数量。Exemplarily, the server may determine the target map point corresponding to the image to be tested from the three-dimensional visual map of the target scene, and determine the global positioning track of the terminal device in the three-dimensional visual map based on the target map point. Then, the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track. Exemplarily, the frame rate of the fused positioning pose included in the fused positioning track may be greater than the frame rate of the global positioning pose included in the global positioning track, that is, the frame rate of the fused positioning track is higher than the frame rate of the global positioning track . Among them, the pose frame rate refers to the frequency of pose output, that is, the number of poses output by the system per second. The fused positioning track may be a high frame rate pose in the 3D visual map, and the global positioning track may be a low frame rate pose in the 3D visual map. The frame rate of the fused localization trajectory is higher than that of the global localization trajectory, indicating that the number of fused localization poses is greater than the number of global localization poses. In addition, the frame rate of the fused positioning pose included in the fused positioning track may be equal to the frame rate of the self-positioning pose included in the self-positioning track, that is, the frame rate of the fused positioning track is equal to the frame rate of the self-positioning track, that is, the self-positioning Trajectories can be high frame rate poses. The frame rate of the fused localization trajectory is equal to the frame rate of the self-localization trajectory, which means that the number of fused localization poses is equal to the number of self-localization poses.
在一种可能的实施方式中,该三维视觉地图可以包括但不限于以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息。其中,服务器从目标场景的三维视觉地图中确定出与该待测图像对应的目标地图点,并基于目标地图点确定终端设备在该三维视觉地图中的全局定位轨迹,可以包括但不限于:针对每帧待测图像,基于该待测图像与该三维视觉地图对应的多帧样本图像之间的相似度,从多帧样本图像中选取出候选样本图像。从该待测图像中获取多个特征点;针对每个特征点,从该候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点。基于多个特征点和多个特征点对应的目标地图点确定该待测图像对应的该三维视觉地图中的全局定位位姿。基于所有待测图像对应的全局定位位姿生成终端设备在该三维视觉地图中的全局定位轨迹。In a possible implementation manner, the 3D visual map may include but not limited to at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, and a sample local descriptor corresponding to the feature points in the sample image. Descriptor, map point information. Wherein, the server determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the global positioning track of the terminal device in the 3D visual map based on the target map point, which may include but not limited to: For each frame of the image to be tested, candidate sample images are selected from the multiple frames of sample images based on the similarity between the image to be tested and the multiple frames of sample images corresponding to the three-dimensional visual map. A plurality of feature points are acquired from the image to be tested; for each feature point, a target map point corresponding to the feature point is determined from the plurality of map points corresponding to the candidate sample image. A global positioning pose in the three-dimensional visual map corresponding to the image to be tested is determined based on the plurality of feature points and target map points corresponding to the plurality of feature points. A global positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the global positioning poses corresponding to all images to be tested.
服务器基于该待测图像与该三维视觉地图对应的多帧样本图像之间的相似度,从多帧样本图像中选取出候选样本图像,可以包括:确定该待测图像对应的待测全局描述子,确定该待测全局描述子与该三维视觉地图对应的每帧样本图像对应的样本全局描述子之间的距离;其中,该三维视觉地图至少包括每帧样本图像对应的样本全局描述子。基于该待测全局描述子与每个样本全局描述子之间的距离,从多帧样本图像中选取出候选样本图像;其中,该待测全局描述子与候选样本图像对应的样本全局描述子之间的距离为该待测全局描述子与各样本全局描述子之间的距离中的最小距离;和/或,该待测全局描述子与候选样本图像对应的样本全局描述子之间的距离小于距离阈值。Based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, the server selects candidate sample images from the multi-frame sample images, which may include: determining the global descriptor to be tested corresponding to the image to be tested , determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map; wherein, the 3D visual map includes at least the sample global descriptor corresponding to each frame of sample image. Based on the distance between the global descriptor to be tested and each sample global descriptor, a candidate sample image is selected from multiple frames of sample images; wherein, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image The distance between the global descriptor to be tested and the distance between each sample global descriptor is the minimum distance; and/or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is less than distance threshold.
服务器确定该待测图像对应的待测全局描述子,可以包括但不限于:基于已训练的字典模型确定该待测图像对应的词袋向量,将该词袋向量确定为该待测图像对应的待测全局描述子;或,将该待测图像输入给已训练的深度学习模型,得到该待测图像对应的目标向量,将该目标向量确定为该待测图像对应的待测全局描述子。当然,上述只是确定待测全局描述子的示例,对此不做限制。The server determines the global descriptor to be tested corresponding to the image to be tested, which may include but not limited to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and determine the bag-of-words vector corresponding to the image-to-be-tested A global descriptor to be tested; or, inputting the image to be tested into a trained deep learning model to obtain a target vector corresponding to the image to be tested, and determining the target vector as the global descriptor to be tested corresponding to the image to be tested. Of course, the above is just an example of determining the global descriptor to be tested, and is not limited thereto.
服务器从候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点,可以包括但不限于:确定该特征点对应的待测局部描述子,该待测局部描述子用于表示该特征点所处图像块的特征向量,且该图像块可以位于该待测图像中。确定该待测局部描述子与该候选样本图像对应的每个地图点对应的样本局部描述子之间的距离;其中,该三维视觉地图至少包括该候选样本图像对应的每个地图点对应的样本局部描述子。然后,可以基于该待测局部描述子与每个样本局部描述子之间的距离,从该候选样本图像对应的多个地图点中选取目标地图点;其中,该待测局部描述子与目标地图点对应的样本局部描述子之间的距离可以为该待测局部描述子与各样本局部描述子之间的距离中的最小距离,和/或,该待测局部描述子与目标地图点对应的样本局部描述子之间的距离小于距离阈值。The server determines the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image, which may include but not limited to: determining the local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used for represents the feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested. determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each map point corresponding to the candidate sample image; wherein, the three-dimensional visual map includes at least a sample corresponding to each map point corresponding to the candidate sample image local descriptor. Then, the target map point can be selected from multiple map points corresponding to the candidate sample image based on the distance between the local descriptor to be tested and each sample local descriptor; wherein, the local descriptor to be tested and the target map The distance between the sample local descriptors corresponding to the points may be the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor, and/or, the distance between the local descriptor to be tested and the target map point The distance between sample local descriptors is less than the distance threshold.
其中,服务器基于该自定位轨迹和该全局定位轨迹生成终端设备在该三维视觉地图中的融合定位轨迹,可以包括但不限于:服务器可以从自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从全局定位轨迹包括的所有全局定位位姿中选取出与目标时间段对应的P个全局定位位姿;其中,N和P为正整数,且N大于P。基于N个自定位位姿和P个全局定位位姿确定N个自定位位姿对应的N个融合定位位姿,N个自定位位姿与N个融合定位位姿一一对应。基于N个融合定位位姿生成终端设备在三维视觉地图中的融合定位轨迹。Wherein, the server generates the fused positioning trajectory of the terminal device in the 3D visual map based on the self-positioning trajectory and the global positioning trajectory, which may include but not limited to: the server may select from all self-positioning poses included in the self-positioning trajectory N self-localization poses corresponding to the target time period, and select P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; wherein, N and P are positive integers, and N is greater than P. Based on the N self-positioning poses and the P global positioning poses, N fusion positioning poses corresponding to the N self-positioning poses are determined, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses. Based on the N fusion positioning poses, the fusion positioning trajectory of the terminal device in the 3D visual map is generated.
在基于该自定位轨迹和该全局定位轨迹生成终端设备在该三维视觉地图中的融合定位轨迹之后,服务器还可以对该融合定位轨迹进行更新处理。具体地,服务器还可以从该融合定位轨迹中选取出初始融合定位位姿,从该自定位轨迹中选取出与该初始融合定位位姿对应的初始自定位位姿。从该自定位轨迹中选取出目标自定位位姿,并基于该初始融合定位位姿、该初始自定位位姿和该目标自定位位姿确定目标融合定位位姿。然后,基于目标融合定位位姿和该融合定位轨迹生成新的融合定位轨迹,以替换原融合定位轨迹。After the fused positioning track of the terminal device in the three-dimensional visual map is generated based on the self-positioning track and the global positioning track, the server may also update the fused positioning track. Specifically, the server may also select an initial fused positioning pose from the fused positioning trajectory, and select an initial self-localization pose corresponding to the initial fused positioning pose from the self-localization trajectory. The target self-localization pose is selected from the self-localization trajectory, and the target fusion localization pose is determined based on the initial fusion localization pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory is generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
步骤104、针对该融合定位轨迹中的每个融合定位位姿,服务器确定与该融合定位位姿对应的目标定位位姿,并显示该目标定位位姿。Step 104: For each fused positioning pose in the fused positioning trajectory, the server determines a target positioning pose corresponding to the fused positioning pose, and displays the target positioning pose.
比如说,服务器可以将该融合定位位姿确定为目标定位位姿,并在三维视觉地图中显示该目标定位位姿。或者,服务器基于三维视觉地图与三维可视化地图之间的目标变换矩阵,将该融合定位位姿转换为三维可视化地图中的目标定位位姿,并通过该三维可视化地图显示该目标定位位姿。其 中,三维视觉地图由视觉建图算法构建得到,仅用于地图定位算法使用;三维可视化地图是用于给人展示场景三维结构的三维模型。For example, the server may determine the fused positioning pose as the target positioning pose, and display the target positioning pose on the three-dimensional visual map. Alternatively, based on the target transformation matrix between the 3D visual map and the 3D visualized map, the server converts the fusion positioning pose into the target positioning pose in the 3D visualized map, and displays the target positioning pose through the 3D visualized map. Among them, the 3D visual map is constructed by the visual mapping algorithm and is only used for the map positioning algorithm; the 3D visual map is a 3D model used to show the 3D structure of the scene.
示例性的,三维视觉地图与三维可视化地图之间的目标变换矩阵的确定方式,可以包括但不限于:针对目标场景中的多个标定点中的每个标定点,可以确定该标定点对应的坐标对,该坐标对可以包括该标定点在三维视觉地图中的位置坐标和该标定点在三维可视化地图中的位置坐标;基于多个标定点对应的坐标对确定该目标变换矩阵。或者,获取初始变换矩阵,基于该初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标,基于该映射坐标与三维可视化地图中的实际坐标的关系确定该初始变换矩阵是否已收敛;若是,则将该初始变换矩阵确定为目标变换矩阵;若否,则对该初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于该初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标的操作,以此类推,一直到得到目标变换矩阵。或者,对三维可视化地图进行采样,得到与三维可视化地图对应的第一点云;以及,对三维视觉地图进行采样,得到与三维视觉地图对应的第二点云;采用ICP算法对第一点云和第二点云进行配准,得到三维视觉地图与三维可视化地图之间的目标变换矩阵。Exemplarily, the method of determining the target transformation matrix between the 3D visual map and the 3D visual map may include but not limited to: for each of the multiple marked points in the target scene, the corresponding A coordinate pair, the coordinate pair may include the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; the target transformation matrix is determined based on the coordinate pairs corresponding to the multiple calibration points. Or, obtain an initial transformation matrix, map the position coordinates in the three-dimensional visual map to mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determine whether the initial transformation matrix is based on the relationship between the mapping coordinates and the actual coordinates in the three-dimensional visual map Converged; if yes, then determine the initial transformation matrix as the target transformation matrix; if not, then adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, return to execute the 3D visual map based on the initial transformation matrix The position coordinates in are mapped to the mapping coordinates in the three-dimensional visualization map, and so on until the target transformation matrix is obtained. Or, sample the 3D visual map to obtain the first point cloud corresponding to the 3D visual map; and sample the 3D visual map to obtain the second point cloud corresponding to the 3D visual map; Register with the second point cloud to obtain the target transformation matrix between the 3D visual map and the 3D visual map.
由以上技术方案可见,本申请实施例中,提出一种云边结合的定位及显示方法,通过边缘端的终端设备采集目标图像和运动数据,依据目标图像和运动数据进行高帧率的自定位,得到高帧率的自定位轨迹。云端的服务器接收终端设备发送的待测图像和自定位轨迹,依据待测图像和自定位轨迹得到高帧率的融合定位轨迹,即三维视觉地图中的高帧率的融合定位轨迹,实现高帧率和高精度的定位功能,实现高精度、低成本、易部署的室内定位功能,是基于视觉的室内定位方式,并能够在三维可视化地图中显示融合定位轨迹。在上述方式中,由终端设备计算高帧率的自定位轨迹,仅发送自定位轨迹和少量待测图像,减少网络传输的数据量。在服务器进行全局定位,从而减少终端设备的计算资源消耗和存储资源消耗。能够应用在煤炭、电力、石化等能源行业,实现人员(如工人、巡检人员等)的室内定位,快速获知人员的位置信息,保障人员安全。It can be seen from the above technical solutions that in the embodiment of the present application, a positioning and display method combining cloud and edge is proposed, and the target image and motion data are collected through the terminal device at the edge end, and high frame rate self-positioning is performed based on the target image and motion data. Get high frame rate self-localization trajectories. The server in the cloud receives the image to be tested and the self-positioning trajectory sent by the terminal device, and obtains the fusion positioning trajectory with a high frame rate based on the image to be tested and the self-positioning trajectory, that is, the fusion positioning trajectory with a high frame rate in the 3D visual map to achieve high frame rate High-speed and high-precision positioning function, realize high-precision, low-cost, and easy-to-deploy indoor positioning function, it is a vision-based indoor positioning method, and can display the fusion positioning track in the three-dimensional visual map. In the above method, the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing the amount of data transmitted by the network. Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. It can be applied in coal, electric power, petrochemical and other energy industries to realize the indoor positioning of personnel (such as workers, inspection personnel, etc.), quickly obtain the location information of personnel, and ensure the safety of personnel.
以下结合具体实施例,对本申请实施例的位姿显示方法进行说明。The pose display method according to the embodiment of the present application will be described below with reference to specific embodiments.
本申请实施例提出一种云边结合的视觉定位及显示方法,在终端设备在目标场景中移动的过程中,由服务器确定终端设备在三维视觉地图中的融合定位轨迹,并显示融合定位轨迹。目标场景可以是室内环境,即终端设备在室内环境移动时,由服务器确定终端设备在三维视觉地图中的融合定位轨迹,即提出基于视觉的室内定位方式,当然,目标场景也可以是室外环境,对此不做限制。The embodiment of the present application proposes a cloud-edge combined visual positioning and display method. When the terminal device is moving in the target scene, the server determines the fused positioning track of the terminal device in the 3D visual map, and displays the fused positioning track. The target scene can be an indoor environment, that is, when the terminal device moves in the indoor environment, the server determines the fusion positioning track of the terminal device in the 3D visual map, that is, a vision-based indoor positioning method is proposed. Of course, the target scene can also be an outdoor environment. There is no restriction on this.
参见图2所示,为云边管理***的结构示意图,云边管理***可以包括终端设备(即边缘端的终端设备)和服务器(即云端的服务器),当然,云边管理***还可以包括其它设备,如无线基站和路由器等,对此不做限制。服务器可以包括目标场景的三维视觉地图、与该三维视觉地图对应的三维可视化地图,服务器可以生成终端设备在该三维视觉地图中的融合定位轨迹,并在三维可视化地图中显示该融合定位轨迹(需要转换为能够在三维可视化地图中显示的轨迹),从而使管理人员通过web端查看三维可视化地图中的融合定位轨迹。Referring to Figure 2, it is a schematic structural diagram of the cloud edge management system. The cloud edge management system may include terminal devices (that is, edge terminal devices) and servers (that is, cloud servers). Of course, the cloud edge management system may also include other devices , such as wireless base stations and routers, etc., there is no restriction on this. The server may include a 3D visual map of the target scene and a 3D visual map corresponding to the 3D visual map, the server may generate a fusion positioning track of the terminal device in the 3D visual map, and display the fusion positioning track in the 3D visual map (required converted into a trajectory that can be displayed in the three-dimensional visualization map), so that managers can view the fusion positioning trajectory in the three-dimensional visualization map through the web.
终端设备可以包括视觉传感器和运动传感器等,视觉传感器可以如摄像机等,视觉传感器用于在终端设备的移动过程中,采集目标场景的图像。为了区分方便,将该图像记为目标图像,目标图像包括多帧图像(即终端设备移动过程中采集的多帧实时图像)。运动传感器可以如IMU(Inertial Measurement Unit,惯性测量单元)等,IMU是包含陀螺仪和加速度计的测量装置,运动传感器用于在终端设备的移动过程中,采集终端设备的运动数据,如加速度和角速度等。The terminal device may include a vision sensor and a motion sensor, etc. The vision sensor may be a camera, etc., and the vision sensor is used to collect images of a target scene during the movement of the terminal device. For the convenience of distinction, this image is recorded as a target image, and the target image includes multiple frames of images (that is, multiple frames of real-time images collected during the movement of the terminal device). The motion sensor can be such as IMU (Inertial Measurement Unit, Inertial Measurement Unit), etc. The IMU is a measuring device including a gyroscope and an accelerometer. The motion sensor is used to collect motion data of the terminal device during the movement of the terminal device, such as acceleration and angular velocity etc.
示例性的,终端设备可以为可穿戴设备(如视频安全帽、智能手表、智能眼镜等等),且视觉传感器和运动传感器部署在可穿戴设备上;或者,终端设备为记录仪(如由工作人员在执行工作时随身携带,具有集实时视音频采集、照相、录音、对讲、定位等功能于一体的设备),且视觉传感器和运动传感器部署在记录仪上;或者,终端设备为摄像机(如分体式摄像机等),且视觉传感器和运动传感器部署在摄像机上。当然,上述只是示例,对此终端设备的类型不做限制,如还可以为智能手机等,只要部署视觉传感器和运动传感器即可。Exemplarily, the terminal device can be a wearable device (such as a video helmet, smart watch, smart glasses, etc.), and the visual sensor and the motion sensor are deployed on the wearable device; Personnel carry it with them when performing work, and have a device that integrates real-time video and audio collection, photography, recording, intercom, positioning, etc.), and the visual sensor and motion sensor are deployed on the recorder; or, the terminal device is a camera ( Such as split cameras, etc.), and vision sensors and motion sensors are deployed on the cameras. Of course, the above is just an example, and there is no limitation on the type of the terminal device, such as a smart phone, etc., as long as the vision sensor and the motion sensor are deployed.
示例性的,终端设备可以获取目标图像和运动数据,依据目标图像和运动数据进行高帧率的自定位,得到高帧率的自定位轨迹(如6DOF(六自由度)的自定位轨迹)。自定位轨迹可以包括多个自定位位姿,由于自定位轨迹是高帧率的自定位轨迹,因此,自定位轨迹中的自定位位姿的数量比较多。Exemplarily, the terminal device can acquire target images and motion data, perform high frame rate self-positioning based on the target images and motion data, and obtain high frame rate self-positioning trajectories (such as 6DOF (six degrees of freedom) self-positioning trajectories). The self-localization trajectory may include multiple self-localization poses. Since the self-localization trajectory is a self-localization trajectory with a high frame rate, the number of self-localization poses in the self-localization trajectory is relatively large.
终端设备可以从目标图像的多帧图像中选取部分帧图像作为待测图像,并将高帧率的自定位轨迹和待测图像发送给服务器。服务器可以得到自定位轨迹和待测图像,服务器可以依据待测图像和目标场景的三维视觉地图进行低帧率的全局定位,得到低帧率的全局定位轨迹(即终端设备在三维视觉地图中的全局定位轨迹)。全局定位轨迹可以包括多个全局定位位姿,由于全局定位轨迹是低帧率的全局定位轨迹,因此,全局定位轨迹中的全局定位位姿的数量比较少。The terminal device can select some frame images from the multi-frame images of the target image as the image to be tested, and send the high frame rate self-positioning trajectory and the image to be tested to the server. The server can obtain the self-positioning track and the image to be tested, and the server can perform global positioning at a low frame rate according to the image to be tested and the 3D visual map of the target scene, and obtain a global positioning track with a low frame rate (that is, the position of the terminal device in the 3D visual map) global positioning track). The global positioning track may include multiple global positioning poses. Since the global positioning track is a global positioning track with a low frame rate, the number of global positioning poses in the global positioning track is relatively small.
服务器可以将高帧率的自定位轨迹和低帧率的全局定位轨迹进行融合,得到高帧率的融合定位轨迹,即三维视觉地图中的高帧率的融合定位轨迹,即得到高帧率融合定位结果。融合定位轨迹可以包括多个融合定位位姿,由于融合定位轨迹是高帧率的融合定位轨迹,因此,融合定位轨迹中的融合定位位姿的数量比较多。The server can fuse the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory to obtain the high frame rate fusion positioning trajectory, that is, the high frame rate fusion positioning trajectory in the 3D visual map, that is, the high frame rate fusion positioning results. The fused positioning trajectory may include multiple fused positioning poses. Since the fused positioning trajectory is a high frame rate fused positioning trajectory, the number of fused positioning poses in the fused positioning trajectory is relatively large.
在上述实施例中,位姿(如自定位位姿、全局定位位姿、融合定位位姿等)可以是位置与姿态,一般用旋转矩阵和平移向量表示,对此不做限制。In the above embodiments, poses (such as self-positioning poses, global positioning poses, fusion positioning poses, etc.) can be positions and poses, which are generally represented by rotation matrices and translation vectors, without limitation.
综上所述,本实施例中,基于目标图像和运动数据,就可以实现全局统一的高帧率视觉定位功能,得到三维视觉地图中的高帧率的融合定位轨迹(如6DOF位姿),是一种高帧率全局一致的定位方法,实现终端设备的高帧率、高精度、低成本、易部署的室内定位功能,实现室内全局一致的高帧率定位功能。To sum up, in this embodiment, based on the target image and motion data, a globally unified high frame rate visual positioning function can be realized, and a high frame rate fusion positioning trajectory (such as 6DOF pose) in the three-dimensional visual map can be obtained. It is a globally consistent high frame rate positioning method, which realizes high frame rate, high precision, low cost, and easy-to-deploy indoor positioning functions of terminal equipment, and realizes indoor globally consistent high frame rate positioning functions.
以下结合具体应用场景,对本申请实施例的上述过程进行详细说明。The foregoing process in the embodiment of the present application will be described in detail below in combination with specific application scenarios.
一、终端设备的自定位。终端设备是带有视觉传感器和运动传感器的电子设备,可以获取目标场景的目标图像(如连续视频图像)和终端设备的运动数据(如IMU数据),并基于目标图像和运动数据确定终端设备的自定位轨迹。1. Self-positioning of terminal equipment. The terminal device is an electronic device with a visual sensor and a motion sensor, which can acquire the target image of the target scene (such as continuous video image) and the motion data of the terminal device (such as IMU data), and determine the terminal device's motion based on the target image and motion data self-positioning trajectory.
目标图像可以包括多帧图像,针对每帧图像,终端设备确定与该图像对应的自定位位姿,即多帧图像对应多个自定位位姿。终端设备的自定位轨迹可以包括多个自定位位姿,可以理解为,自定位轨迹是多个自定位位姿的集合。The target image may include multiple frames of images, and for each frame of images, the terminal device determines a self-localization pose corresponding to the image, that is, multiple frames of images correspond to multiple self-localization poses. The self-positioning track of the terminal device may include multiple self-positioning poses, which can be understood as a collection of multiple self-positioning poses.
针对多帧图像中的第一帧图像,终端设备确定第一帧图像对应的自定位位姿,针对多帧图像中的第二帧图像,终端设备确定第二帧图像对应的自定位位姿,以此类推。第一帧图像对应的自定位位姿可以是参考坐标系(即自定位坐标系)的坐标原点,第二帧图像对应的自定位位姿是在参考坐标系中的位姿点,即相对于坐标原点(即第一帧图像对应的自定位位姿)的位姿点,第三帧图像对应的自定位位姿是在参考坐标系中的位姿点,即相对于坐标原点的位姿点,以此类推,各帧图像对应的自定位位姿均是在参考坐标系中的位姿点。For the first frame image in the multi-frame images, the terminal device determines the self-localization pose corresponding to the first frame image, and for the second frame image in the multi-frame images, the terminal device determines the self-localization pose corresponding to the second frame image, and so on. The self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), and the self-localization pose corresponding to the second frame image is the pose point in the reference coordinate system, that is, relative to The pose point of the coordinate origin (that is, the self-positioning pose corresponding to the first frame image), and the self-positioning pose corresponding to the third frame image is the pose point in the reference coordinate system, that is, the pose point relative to the coordinate origin , and so on, the self-localization pose corresponding to each frame image is the pose point in the reference coordinate system.
综上所述,在得到每帧图像对应的自定位位姿之后,就可以将这些自定位位姿组成参考坐标系中的自定位轨迹,该自定位轨迹包括这些自定位位姿。To sum up, after obtaining the self-localization pose corresponding to each frame of image, these self-localization poses can be composed into a self-localization trajectory in the reference coordinate system, and the self-localization trajectory includes these self-localization poses.
在一种可能的实施方式中,参见图3所示,采用如下步骤确定自定位轨迹:In a possible implementation manner, as shown in FIG. 3, the following steps are used to determine the self-positioning trajectory:
步骤301、获取目标场景的目标图像和终端设备的运动数据。 Step 301, acquiring a target image of a target scene and motion data of a terminal device.
步骤302、若目标图像包括多帧图像,则从多帧图像中遍历出当前帧图像。 Step 302, if the target image includes multiple frames of images, traverse the current frame of images from the multiple frames of images.
在从多帧图像中遍历出第一帧图像作为当前帧图像时,第一帧图像对应的自定位位姿可以是参考坐标系(即自定位坐标系)的坐标原点,即自定位位姿与该坐标原点重合。在从多帧图像中遍历出第二帧图像作为当前帧图像时,可以采用后续步骤确定第二帧图像对应的自定位位姿。在从多帧图像中遍历出第三帧图像作为当前帧图像时,可以采用后续步骤确定第三帧图像对应的自定位位姿,以此类推,可以遍历出每帧图像作为当前帧图像。When traversing the first frame image from multiple frames of images as the current frame image, the self-positioning pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system (that is, the self-positioning coordinate system), that is, the self-positioning pose and The origin of the coordinates coincides. When traversing the second frame image from the multiple frame images as the current frame image, subsequent steps may be used to determine the self-localization pose corresponding to the second frame image. When traversing the third frame image from multiple frames of images as the current frame image, subsequent steps can be used to determine the self-localization pose corresponding to the third frame image, and so on, each frame image can be traversed as the current frame image.
步骤303、利用光流算法计算当前帧图像与当前帧图像的前一帧图像之间的特征点关联关系。其中,光流算法是利用当前帧图像中像素在时间域上的变化以及当前帧图像与前一帧图像之间的相关性,来找到当前帧图像与前一帧图像之间存在的对应关系,从而计算出当前帧图像与前一帧图像之间物体的运动信息的方式。 Step 303 , using the optical flow algorithm to calculate the feature point association relationship between the current frame image and the previous frame image of the current frame image. Among them, the optical flow algorithm uses the change of pixels in the current frame image in the time domain and the correlation between the current frame image and the previous frame image to find the corresponding relationship between the current frame image and the previous frame image, Thus, the manner of calculating the motion information of the object between the current frame image and the previous frame image.
步骤304、基于当前帧图像与前一帧图像之间的匹配特征点数量确定当前帧图像是否为关键图像。比如说,若当前帧图像与前一帧图像之间的匹配特征点数量未达到预设阈值,则用于表示当前帧图像与前一帧图像的变化较大,导致两帧图像之间的匹配特征点数量比较少,则确定当前帧图像是关键图像,执行步骤305。若当前帧图像与前一帧图像之间的匹配特征点数量达到预设阈值,则用于表示当前帧图像与前一帧图像的变化较小,导致两帧图像之间的匹配特征点数量比较多,则确定当前帧图像是非关键图像,执行步骤306。Step 304: Determine whether the current frame image is a key image based on the number of matching feature points between the current frame image and the previous frame image. For example, if the number of matching feature points between the current frame image and the previous frame image does not reach the preset threshold, it is used to indicate that the current frame image and the previous frame image have changed greatly, resulting in the matching between the two frame images If the number of feature points is relatively small, it is determined that the current frame image is a key image, and step 305 is performed. If the number of matching feature points between the current frame image and the previous frame image reaches the preset threshold, it is used to indicate that the change between the current frame image and the previous frame image is small, resulting in a comparison of the number of matching feature points between the two frame images If there are many, it is determined that the current frame image is a non-key image, and step 306 is performed.
示例性的,还可以基于当前帧图像与前一帧图像之间的匹配特征点数量计算当前帧图像与前一帧图像之间的匹配比例,比如说,匹配特征点数量与总特征点数量的比例。若该匹配比例未达到预设比例,则确定当前帧图像是关键图像,若该匹配比例达到预设比例,则确定当前帧图像是非关键图像。Exemplarily, the matching ratio between the current frame image and the previous frame image can also be calculated based on the number of matching feature points between the current frame image and the previous frame image, for example, the ratio of the number of matching feature points to the total number of feature points Proportion. If the matching ratio does not reach the preset ratio, it is determined that the current frame image is a key image, and if the matching ratio reaches the preset ratio, it is determined that the current frame image is a non-key image.
步骤305、若当前帧图像是关键图像,则基于终端设备的当前位置(即终端设备采集当前帧图像时所处的位置)生成自定位坐标系(即参考坐标系)中的地图位置,即生成一个新的3D地图位置。若当前帧图像是非关键图像,则不需要基于终端设备的当前位置生成终端设备在自定位坐标系中的地图位置。 Step 305, if the current frame image is the key image, generate a map position in the self-positioning coordinate system (ie, the reference coordinate system) based on the current position of the terminal device (ie, the position where the terminal device is when collecting the current frame image), that is, generate A new 3D map location. If the current frame image is a non-key image, the map position of the terminal device in the self-positioning coordinate system does not need to be generated based on the current position of the terminal device.
步骤306、基于当前帧图像前面的K帧图像的每帧图像对应的自定位位姿、终端设备在自定位坐标系中的地图位置和终端设备的运动数据,确定当前帧图像对应的自定位位姿,K可以为正整数, 可以是根据经验配置的数值,对此不做限制。Step 306: Determine the self-positioning position corresponding to the current frame image based on the self-positioning pose corresponding to each frame image of the K frame image in front of the current frame image, the map position of the terminal device in the self-positioning coordinate system, and the motion data of the terminal device In other words, K can be a positive integer, and can be a value configured according to experience, and there is no limitation on this.
比如说,可以对当前帧图像的前一帧图像与当前帧图像之间的所有运动数据进行预积分,得到这两帧图像之间的惯性测量约束。基于当前帧图像前面的K帧图像(如滑窗)对应的自定位位姿和运动数据(例如速度、加速度、角速度等)、自定位坐标系中的地图位置、惯性测量约束(前一帧图像与当前帧图像之间物体的速度、加速度、角速度等),可以采用捆集优化对“当前帧图像前面的K帧图像(如滑窗)对应的自定位位姿、速度”、“惯性测量传感器的偏置”、“自定位坐标系中的地图点位置”这些状态变量进行联合优化更新,得到当前帧图像对应的自定位位姿,对此捆集优化过程不做限制。For example, all the motion data between the previous frame image of the current frame image and the current frame image can be pre-integrated to obtain the inertial measurement constraints between the two frame images. Based on the self-positioning pose and motion data (such as velocity, acceleration, angular velocity, etc.) corresponding to the K frame images (such as sliding windows) in front of the current frame image, the map position in the self-positioning coordinate system, and inertial measurement constraints (previous frame image The object’s velocity, acceleration, angular velocity, etc.) between the current frame image and the current frame image can be optimized by using the bundled set to optimize the self-positioning pose and velocity corresponding to the K frame images (such as sliding windows) in front of the current frame image, and the inertial measurement sensor The state variables of offset", "map point position in the self-localization coordinate system" are jointly optimized and updated to obtain the self-localization pose corresponding to the current frame image, and there is no limit to this bundle optimization process.
示例性的,为了维护待优化变量的规模,还可以将滑窗内的某个帧及部分地图位置进行边缘化,并将这些约束信息以先验的形式保留下来。Exemplarily, in order to maintain the scale of the variables to be optimized, a certain frame and part of the map positions within the sliding window can also be marginalized, and these constraint information can be preserved in a priori form.
示例性的,终端设备可以采用VIO(Visual Inertial Odometry,视觉惯性里程计)算法确定自定位位姿,也就是说,VIO算法的输入数据是目标图像和运动数据,VIO算法的输出数据是自定位位姿。比如说,基于目标图像和运动数据,VIO算法可以得到自定位位姿,例如,采用VIO算法执行步骤301-步骤306,得到自定位位姿。该VIO算法可以包括但不限于VINS(Visual Inertial Navigation Systems,视觉惯性导航***)、SVO(Semi-direct Visual Odometry,半直接视觉里程计)、MSCKF(Multi State Constraint Kalman Filter,多状态约束下的Kalman滤波器)等,这里不做限定,只要能够得到自定位位姿即可。Exemplarily, the terminal device can use the VIO (Visual Inertial Odometry, visual inertial odometer) algorithm to determine the self-positioning pose, that is to say, the input data of the VIO algorithm is the target image and motion data, and the output data of the VIO algorithm is the self-positioning pose. For example, based on the target image and motion data, the VIO algorithm can obtain the self-localization pose. For example, the VIO algorithm is used to perform steps 301 to 306 to obtain the self-localization pose. The VIO algorithm can include but not limited to VINS (Visual Inertial Navigation Systems, visual inertial navigation system), SVO (Semi-direct Visual Odometry, semi-direct visual odometer), MSCKF (Multi State Constraint Kalman Filter, Kalman under multi-state constraints Filter), etc., are not limited here, as long as the self-localization pose can be obtained.
步骤307、基于多帧图像的每帧图像对应的自定位位姿生成终端设备在自定位坐标系中的自定位轨迹,该自定位轨迹包括自定位坐标系中的多个自定位位姿。Step 307 : Generate a self-positioning trajectory of the terminal device in the self-positioning coordinate system based on the self-positioning pose corresponding to each frame of the multi-frame images, and the self-positioning trajectory includes multiple self-positioning poses in the self-positioning coordinate system.
至此,终端设备可以得到自定位坐标系中的自定位轨迹,该自定位轨迹可以包括多帧图像的每帧图像对应的自定位位姿。显然,由于视觉传感器可以采集大量图像,因此,终端设备可以得到这些图像对应的自定位位姿,即自定位轨迹可以包括大量自定位位姿,也就是说,终端设备能够得到高帧率的自定位轨迹。So far, the terminal device can obtain the self-localization trajectory in the self-localization coordinate system, and the self-localization trajectory may include the self-localization pose corresponding to each frame of multiple images. Obviously, since the visual sensor can collect a large number of images, the terminal device can obtain the self-localization poses corresponding to these images, that is, the self-localization trajectory can include a large number of self-localization poses, that is, the terminal device can obtain high frame rate self-localization poses. positioning track.
二、数据发送。若目标图像包括多帧图像,则终端设备可以从多帧图像中选取部分帧图像作为待测图像,并将待测图像和自定位轨迹发送给服务器。比如说,终端设备通过无线网络(如4G、5G、Wifi等)将自定位轨迹以及待测图像发送到服务器,由于待测图像的帧率较低,因此占用的网络带宽较小。2. Data transmission. If the target image includes multiple frames of images, the terminal device may select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server. For example, the terminal device sends the self-positioning trajectory and the image to be tested to the server through a wireless network (such as 4G, 5G, Wifi, etc.). Since the frame rate of the image to be tested is low, the network bandwidth occupied is small.
三、目标场景的三维视觉地图。需要预先构建目标场景的三维视觉地图,并将三维视觉地图存储到服务器,这样,服务器就可以基于该三维视觉地图进行全局定位。三维视觉地图是对目标场景的图像信息的一种存储方式,可以采集目标场景的多帧样本图像,并基于这些样本图像构建三维视觉地图。比如说,基于目标场景的多帧样本图像,可以采用SFM(Structure From Motion,运动恢复结构)或者SLAM(Simultaneous Localization And Mapping,同时定位与建图)等视觉建图算法,构建目标场景的三维视觉地图,对此构建方式不做限制。3. A 3D visual map of the target scene. It is necessary to pre-build a 3D visual map of the target scene and store the 3D visual map in the server, so that the server can perform global positioning based on the 3D visual map. The 3D visual map is a storage method for the image information of the target scene, which can collect multiple frames of sample images of the target scene, and build a 3D visual map based on these sample images. For example, based on the multi-frame sample image of the target scene, visual mapping algorithms such as SFM (Structure From Motion, motion recovery structure) or SLAM (Simultaneous Localization And Mapping, simultaneous positioning and mapping) can be used to construct a 3D vision of the target scene map, there is no limit to how it can be constructed.
该三维视觉地图可以包括如下信息:The three-dimensional visual map may include the following information:
样本图像位姿:样本图像是构建三维视觉地图时具有代表性的图像,即可以基于样本图像构建三维视觉地图,样本图像的位姿矩阵(可以简称为样本图像位姿)可以被存储到三维视觉地图中,即三维视觉地图可以包括样本图像位姿。Sample image pose: The sample image is a representative image when constructing a 3D visual map, that is, a 3D visual map can be constructed based on the sample image, and the pose matrix of the sample image (which can be referred to as the sample image pose) can be stored in the 3D visual map. The map, ie the 3D visual map, may include sample image poses.
样本全局描述子:针对每帧样本图像,该样本图像可以对应图像全局描述子,将该图像全局描述子记为样本全局描述子,样本全局描述子是用高维向量来表示该样本图像,样本全局描述子用于区分不同样本图像的图像特征。Sample global descriptor: For each frame of sample image, the sample image can correspond to the image global descriptor, and the image global descriptor is recorded as the sample global descriptor. The sample global descriptor is a high-dimensional vector to represent the sample image, and the sample The global descriptor is used to distinguish the image features of different sample images.
其中,针对每帧样本图像,可以基于已训练的字典模型确定该样本图像对应的词袋向量,并将该词袋向量确定为该样本图像对应的样本全局描述子。比如说,视觉词袋(Bag of Words)方法是一种用于确定全局描述子的方式,在视觉词袋方法中,可以构建词袋向量,该词袋向量是一种用于图像相似检测的向量表示方法,可以将该词袋向量作为样本图像对应的样本全局描述子。Wherein, for each frame of sample image, the bag-of-words vector corresponding to the sample image can be determined based on the trained dictionary model, and the bag-of-words vector is determined as the sample global descriptor corresponding to the sample image. For example, the bag of words (Bag of Words) method is a way to determine the global descriptor. In the bag of words method, the word bag vector can be constructed, which is a kind of image similarity detection method. Vector representation method, the bag-of-words vector can be used as the sample global descriptor corresponding to the sample image.
在视觉词袋方法中,需要预先训练一个“字典”,也称为字典模型,一般是利用大量图像中的特征点描述子进行聚类,训练得到一个分类树,每一类分类树可以代表一种视觉“单词”,这些视觉“单词”就组成字典模型。In the bag of visual words method, it is necessary to pre-train a "dictionary", also known as a dictionary model. Generally, a large number of feature point descriptors in images are used for clustering, and a classification tree is obtained after training. Each classification tree can represent a visual "words", and these visual "words" form a dictionary model.
针对样本图像来说,可以将该样本图像中的所有特征点描述子进行“单词”分类,并统计所有单词的出现频率,这样,字典中每个单词的频率可构成一个向量,该向量即为该样本图像对应的词袋向量。该词袋向量可以用于衡量两帧图像的相似程度,将该词袋向量作为该样本图像对应的样本全局描述子。For the sample image, all the feature point descriptors in the sample image can be classified as "words", and the frequency of occurrence of all words can be counted, so that the frequency of each word in the dictionary can form a vector, which is The bag-of-words vector corresponding to the sample image. The bag-of-words vector can be used to measure the similarity between two frames of images, and the bag-of-words vector is used as a sample global descriptor corresponding to the sample image.
其中,针对每帧样本图像,可以将该样本图像输入给已训练的深度学习模型,得到该样本图像对应的目标向量,并将该目标向量确定为该样本图像对应的样本全局描述子。比如说,深度学习方 法是一种用于确定全局描述子的方式,在深度学习方法中,可以通过深度学习模型对样本图像进行多层卷积,并最终得到一个高维的目标向量,将该目标向量作为样本图像对应的样本全局描述子。Wherein, for each frame of sample image, the sample image can be input to the trained deep learning model to obtain the target vector corresponding to the sample image, and determine the target vector as the sample global descriptor corresponding to the sample image. For example, the deep learning method is a way to determine the global descriptor. In the deep learning method, the sample image can be multi-layered through the deep learning model, and finally a high-dimensional target vector is obtained. The target vector is used as the sample global descriptor corresponding to the sample image.
在深度学习方法中,需要预先训练深度学习模型,如CNN(Convolutional Neural Networks,卷积神经网络)模型等,一般是利用大量图像训练得到深度学习模型,对此深度学习模型的训练方式不做限制。针对样本图像来说,可以将该样本图像输入给深度学习模型,由深度学习模型对该样本图像进行处理,得到一个高维的目标向量,将该目标向量作为该样本图像对应的样本全局描述子。In the deep learning method, it is necessary to pre-train the deep learning model, such as the CNN (Convolutional Neural Networks, Convolutional Neural Network) model, etc., and generally use a large number of image training to obtain the deep learning model, and there is no limit to the training method of the deep learning model . For the sample image, the sample image can be input to the deep learning model, and the deep learning model processes the sample image to obtain a high-dimensional target vector, which is used as the sample global descriptor corresponding to the sample image .
样本图像的特征点对应的样本局部描述子:针对每帧样本图像,该样本图像可以包括多个特征点,特征点可以包括该样本图像中具有特异性的像素点位置以及用于描述该位置局部范围的描述子这两部分信息,即该特征点可以对应一个图像局部描述子,将该图像局部描述子记为样本局部描述子,样本局部描述子是用一个向量来描述特征点(即像素点位置)附近范围内的图像块的特征,该向量也可以称为特征点的描述子。综上所述,样本局部描述子是用于表示特征点所处图像块的特征向量,且该图像块可以位于该样本图像中。需要注意的是,针对样本图像中的特征点(即二维特征点),该特征点可以对应三维视觉地图中的地图点(即三维地图点),因此,特征点对应的样本局部描述子,也可以是该特征点对应的地图点对应的样本局部描述子。The sample local descriptor corresponding to the feature point of the sample image: For each frame of the sample image, the sample image can include multiple feature points, and the feature points can include specific pixel positions in the sample image and are used to describe the position local The two parts of information of the descriptor of the range, that is, the feature point can correspond to an image local descriptor, and the image local descriptor is recorded as the sample local descriptor, and the sample local descriptor uses a vector to describe the feature point (that is, the pixel point The feature of the image block in the vicinity of the position), this vector can also be called the descriptor of the feature point. To sum up, the sample local descriptor is a feature vector used to represent the image block where the feature point is located, and the image block can be located in the sample image. It should be noted that, for a feature point in the sample image (ie, a two-dimensional feature point), the feature point can correspond to a map point in a three-dimensional visual map (ie, a three-dimensional map point). Therefore, the sample local descriptor corresponding to the feature point, It may also be a sample local descriptor corresponding to the map point corresponding to the feature point.
其中,可以采用ORB(Oriented FAST and Rotated BRIEF,定向快速旋转)、SIFT(Scale-Invariant Feature Transform,尺度不变特征变换)、SURF(Speeded Up Robust Features,加速稳健特征)等算法,从样本图像中提取特征点,并确定特征点对应的样本局部描述子。也可以采用深度学习算法(如SuperPoint,DELF,D2-Net等),从样本图像中提取特征点,并确定特征点对应的样本局部描述子,对此不做限制,只要能够得到特征点,并确定样本局部描述子即可。Among them, algorithms such as ORB (Oriented FAST and Rotated BRIEF, oriented fast rotation), SIFT (Scale-Invariant Feature Transform, scale-invariant feature transformation), SURF (Speeded Up Robust Features, accelerated robust features) can be used to extract from the sample image Extract the feature points and determine the sample local descriptors corresponding to the feature points. It is also possible to use deep learning algorithms (such as SuperPoint, DELF, D2-Net, etc.) to extract feature points from the sample image, and determine the sample local descriptors corresponding to the feature points. There is no limit to this, as long as the feature points can be obtained, and It is enough to determine the local descriptor of the sample.
地图点信息:地图点信息可以包括但不限于:该地图点的3D空间位置、所有被观测的样本图像以及对应的2D特征点(即地图点对应的特征点)编号。Map point information: map point information may include, but not limited to: the 3D spatial position of the map point, all observed sample images, and the corresponding 2D feature points (that is, the feature points corresponding to the map point) numbers.
四、服务器的全局定位。基于已获取的目标场景的三维视觉地图,服务器在得到待测图像后,从目标场景的三维视觉地图中确定出与待测图像对应的目标地图点,基于目标地图点确定终端设备在三维视觉地图中的全局定位轨迹。Fourth, the global positioning of the server. Based on the acquired 3D visual map of the target scene, after the server obtains the image to be tested, it determines the target map point corresponding to the image to be tested from the 3D visual map of the target scene, and determines the location of the terminal device on the 3D visual map based on the target map point. The global positioning track in .
针对每帧待测图像,服务器可以确定与待测图像对应的全局定位位姿,假设存在M帧待测图像,则M帧待测图像对应M个全局定位位姿,终端设备在三维视觉地图中的全局定位轨迹可以包括M个全局定位位姿,可以理解为,全局定位轨迹是M个全局定位位姿的集合。针对M帧待测图像中的第一帧待测图像,确定第一帧待测图像对应的全局定位位姿,针对第二帧待测图像,确定第二帧待测图像对应的全局定位位姿,以此类推。针对每个全局定位位姿,该全局定位位姿是三维视觉地图中的位姿点,即三维视觉地图坐标系中的位姿点。综上所述,在得到M帧待测图像对应的全局定位位姿后,将这些全局定位位姿组成三维视觉地图中的全局定位轨迹,该全局定位轨迹包括这些全局定位位姿。For each frame of the image to be tested, the server can determine the global positioning pose corresponding to the image to be tested. Assuming that there are M frames of the image to be tested, the M frames of the image to be tested correspond to M global positioning poses, and the terminal device in the three-dimensional visual map The global positioning trajectory of can include M global positioning poses, which can be understood as the global positioning trajectory is a collection of M global positioning poses. For the first frame of the image to be tested in the M frames of images to be tested, determine the global positioning pose corresponding to the first frame of the image to be tested, and for the second frame of the image to be tested, determine the global positioning pose corresponding to the second frame of the image to be tested , and so on. For each global positioning pose, the global positioning pose is a pose point in the 3D visual map, that is, a pose point in the 3D visual map coordinate system. To sum up, after obtaining the global positioning poses corresponding to M frames of images to be tested, these global positioning poses are composed into a global positioning track in the 3D visual map, and the global positioning track includes these global positioning poses.
基于目标场景的三维视觉地图,在一种可能的实施方式中,参见图4所示,服务器可以采用如下步骤确定终端设备在该三维视觉地图中的全局定位轨迹:Based on the 3D visual map of the target scene, in a possible implementation manner, as shown in FIG. 4, the server may determine the global positioning track of the terminal device in the 3D visual map by using the following steps:
步骤401、服务器从终端设备获取目标场景的待测图像。 Step 401, the server acquires the image to be tested of the target scene from the terminal device.
示例性的,终端设备可以获取目标图像,且目标图像包括多帧图像,终端设备可以从多帧图像中选取M帧图像作为待测图像,并将M帧待测图像发送给服务器。比如说,多帧图像包括关键图像和非关键图像,在此基础上,终端设备可以将多帧图像中的关键图像作为待测图像,而非关键图像不作为待测图像。又例如,终端设备可以按照固定间隔从多帧图像中选取出待测图像,假设固定间隔是5(当然,固定间隔可以根据经验任意配置,对此不做限制),则可以将第1帧图像作为待测图像,将第6(1+5)帧图像作为待测图像,将第11(6+5)帧图像作为待测图像,以此类推,每隔5帧图像选取出一帧待测图像。Exemplarily, the terminal device may acquire a target image, and the target image includes multiple frames of images, the terminal device may select M frames of images from the multiple frames of images as images to be tested, and send the M frames of images to be tested to the server. For example, the multi-frame images include key images and non-key images. On this basis, the terminal device may use the key images in the multi-frame images as the images to be tested, while the non-key images are not used as the images to be tested. For another example, the terminal device can select the image to be tested from multiple frames of images at a fixed interval, assuming that the fixed interval is 5 (of course, the fixed interval can be arbitrarily configured according to experience, and there is no limit to this), then the first frame of image can be As the image to be tested, the image of the 6th (1+5) frame is used as the image to be tested, the image of the 11th (6+5) frame is used as the image to be tested, and so on, and one frame is selected for every 5 frames of images to be tested image.
步骤402、针对每帧待测图像,确定该待测图像对应的待测全局描述子。 Step 402, for each frame of the image to be tested, determine the global descriptor to be tested corresponding to the image to be tested.
示例性的,针对每帧待测图像,该待测图像可以对应图像全局描述子,可以将该图像全局描述子记为待测全局描述子,待测全局描述子是用高维向量来表示该待测图像,待测全局描述子用于区分不同待测图像的图像特征。Exemplarily, for each frame of an image to be tested, the image to be tested may correspond to a global descriptor of the image, and the global descriptor of the image may be recorded as a global descriptor to be tested, and the global descriptor to be tested is represented by a high-dimensional vector The image to be tested, the global descriptor to be tested is used to distinguish the image features of different images to be tested.
其中,针对每帧待测图像,基于已训练的字典模型确定该待测图像对应的词袋向量,将该词袋向量确定为该待测图像对应的待测全局描述子。或者,针对每帧待测图像,将该待测图像输入给已训练的深度学习模型,得到该待测图像对应的目标向量,将该目标向量确定为该待测图像对应的待测全局描述子。Wherein, for each frame of the image to be tested, the bag of words vector corresponding to the image to be tested is determined based on the trained dictionary model, and the bag of words vector is determined as the global descriptor to be tested corresponding to the image to be tested. Or, for each frame of the image to be tested, input the image to be tested to the trained deep learning model to obtain the target vector corresponding to the image to be tested, and determine the target vector as the global descriptor to be tested corresponding to the image to be tested .
综上所述,可以基于视觉词袋方法或者深度学习方法确定待测图像对应的待测全局描述子,确定方式参见样本全局描述子的确定方式,在此不再赘述。To sum up, the global descriptor to be tested corresponding to the image to be tested can be determined based on the bag of visual words method or the deep learning method. For the determination method, refer to the determination method of the global descriptor of the sample, and will not be repeated here.
步骤403、针对每帧待测图像,确定该待测图像对应的待测全局描述子与三维视觉地图对应的 每帧样本图像对应的样本全局描述子之间的相似度。Step 403: For each frame of the image to be tested, determine the similarity between the global descriptor to be tested corresponding to the image to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the 3D visual map.
参见上述实施例,三维视觉地图可以包括每帧样本图像对应的样本全局描述子,因此,可以确定该待测全局描述子与每个样本全局描述子之间的相似度,以相似度是“距离相似度”为例,则可以确定待测全局描述子与每个样本全局描述子之间距离,如欧式距离,即计算两个特征向量之间的欧式距离。Referring to the above-mentioned embodiment, the three-dimensional visual map can include the sample global descriptor corresponding to each frame of sample image, therefore, the similarity between the global descriptor to be tested and each sample global descriptor can be determined, and the similarity is "distance "Similarity" as an example, the distance between the global descriptor to be tested and each sample global descriptor can be determined, such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
步骤404、基于该待测全局描述子与每个样本全局描述子之间的距离,从三维视觉地图对应的多帧样本图像中选取出候选样本图像;其中,该待测全局描述子与候选样本图像对应的样本全局描述子之间的距离为该待测全局描述子与各个样本全局描述子之间的距离中的最小距离;和/或,该待测全局描述子与候选样本图像对应的样本全局描述子之间的距离小于距离阈值。Step 404: Based on the distance between the global descriptor to be tested and the global descriptor of each sample, select candidate sample images from the multi-frame sample images corresponding to the 3D visual map; wherein, the global descriptor to be tested and the candidate sample The distance between the sample global descriptors corresponding to the image is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor; and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image The distance between global descriptors is less than the distance threshold.
比如说,假设三维视觉地图对应样本图像1、样本图像2和样本图像3,则可以计算该待测全局描述子与样本图像1对应的样本全局描述子之间的距离1,并计算该待测全局描述子与样本图像2对应的样本全局描述子之间的距离2,并计算该待测全局描述子与样本图像3对应的样本全局描述子之间的距离3。For example, assuming that the 3D visual map corresponds to sample image 1, sample image 2, and sample image 3, the distance 1 between the global descriptor to be tested and the sample global descriptor corresponding to sample image 1 can be calculated, and the The distance 2 between the global descriptor and the sample global descriptor corresponding to the sample image 2, and the distance 3 between the global descriptor to be tested and the sample global descriptor corresponding to the sample image 3 is calculated.
在一种可能的实施方式中,若距离1是最小距离,则将样本图像1选取为候选样本图像。或者,若距离1小于距离阈值(可以根据经验进行配置),且距离2小于距离阈值,但是距离3不小于距离阈值,则将样本图像1和样本图像2均选取为候选样本图像。或者,若距离1是最小距离,且距离1小于距离阈值,则将样本图像1选取为候选样本图像,但是,若距离1是最小距离,且距离1不小于距离阈值,则无法选取出候选样本图像,即重定位失败。In a possible implementation manner, if the distance 1 is the minimum distance, the sample image 1 is selected as the candidate sample image. Or, if distance 1 is less than the distance threshold (can be configured based on experience), and distance 2 is less than the distance threshold, but distance 3 is not less than the distance threshold, then both sample image 1 and sample image 2 are selected as candidate sample images. Or, if the distance 1 is the minimum distance and the distance 1 is less than the distance threshold, the sample image 1 is selected as the candidate sample image, but if the distance 1 is the minimum distance and the distance 1 is not less than the distance threshold, the candidate sample cannot be selected image, i.e. relocation failed.
综上所述,针对每帧待测图像,可以从三维视觉地图对应的多帧样本图像中选取出与该待测图像对应的候选样本图像,候选样本图像的数量为至少一个。To sum up, for each frame of the image to be tested, a candidate sample image corresponding to the image to be tested may be selected from multiple frames of sample images corresponding to the three-dimensional visual map, and the number of candidate sample images is at least one.
步骤405、针对每帧待测图像,从该待测图像中获取多个特征点,针对每个特征点,确定该特征点对应的待测局部描述子,该待测局部描述子用于表示该特征点所处图像块的特征向量,且该图像块可以位于该待测图像中。 Step 405. For each frame of the image to be tested, obtain a plurality of feature points from the image to be tested, and for each feature point, determine a local descriptor to be tested corresponding to the feature point, and the local descriptor to be tested is used to represent the The feature vector of the image block where the feature point is located, and the image block may be located in the image to be tested.
比如说,该待测图像可以包括多个特征点,特征点可以是该待测图像中具有特异性的像素点位置。该特征点可以对应一个图像局部描述子,将该图像局部描述子记为待测局部描述子。待测局部描述子是用一个向量来描述特征点(即像素点位置)附近范围内的图像块的特征,该向量也可以称为特征点的描述子。综上所述,待测局部描述子是用于表示特征点所处图像块的特征向量。For example, the image to be tested may include a plurality of feature points, and the feature points may be specific pixel positions in the image to be tested. The feature point may correspond to an image local descriptor, which is recorded as the local descriptor to be tested. The local descriptor to be tested uses a vector to describe the feature of the image block in the vicinity of the feature point (that is, the pixel point position), and the vector can also be called the descriptor of the feature point. To sum up, the local descriptor to be tested is a feature vector used to represent the image block where the feature point is located.
其中,可以采用ORB、SIFT、SURF等算法,从待测图像中提取特征点,并确定特征点对应的待测局部描述子。也可以采用深度学习算法(如SuperPoint,DELF,D2-Net等),从待测图像中提取特征点,并确定特征点对应的待测局部描述子,对此不做限制,只要能够得到特征点,并确定待测局部描述子即可。Among them, ORB, SIFT, SURF and other algorithms can be used to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. It is also possible to use deep learning algorithms (such as SuperPoint, DELF, D2-Net, etc.) to extract feature points from the image to be tested, and determine the local descriptors to be tested corresponding to the feature points. There is no limit to this, as long as the feature points can be obtained , and determine the local descriptor to be tested.
步骤406、针对待测图像对应的每个特征点,确定该特征点对应的待测局部描述子与该待测图像对应的候选样本图像对应的每个地图点对应的样本局部描述子(即候选样本图像中的每个特征点对应的地图点对应的样本局部描述子)之间的距离,如欧式距离,即计算两个特征向量之间的欧式距离。 Step 406, for each feature point corresponding to the image to be tested, determine the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image corresponding to the image to be tested (i.e. candidate The map point corresponding to each feature point in the sample image corresponds to the distance between the sample local descriptors), such as the Euclidean distance, that is, the Euclidean distance between two feature vectors is calculated.
参见上述实施例,针对每帧样本图像,三维视觉地图包括该样本图像对应的每个地图点对应的样本局部描述子,因此,在得到待测图像对应的候选样本图像后,从三维视觉地图中获取候选样本图像对应的每个地图点对应的样本局部描述子。在得到待测图像对应的每个特征点后,确定该特征点对应的待测局部描述子与候选样本图像对应的每个地图点对应的样本局部描述子之间的距离。Referring to the above embodiment, for each frame of sample image, the 3D visual map includes sample local descriptors corresponding to each map point corresponding to the sample image, therefore, after obtaining the candidate sample image corresponding to the image to be tested, from the 3D visual map Obtain the sample local descriptor corresponding to each map point corresponding to the candidate sample image. After each feature point corresponding to the image to be tested is obtained, the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image is determined.
步骤407、针对每个特征点,基于该特征点对应的待测局部描述子与候选样本图像对应的每个地图点对应的样本局部描述子之间的距离,从候选样本图像对应的多个地图点中选取目标地图点;其中,该待测局部描述子与目标地图点对应的样本局部描述子之间的距离为该待测局部描述子与各个样本局部描述子之间的距离中的最小距离,和/或,该待测局部描述子与目标地图点对应的样本局部描述子之间的距离小于距离阈值。 Step 407, for each feature point, based on the distance between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to each map point corresponding to the candidate sample image, from multiple maps corresponding to the candidate sample image The target map point is selected from the points; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is the minimum distance among the distances between the local descriptor to be tested and each sample local descriptor , and/or, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is smaller than a distance threshold.
比如说,假设候选样本图像对应地图点1、地图点2和地图点3,则可以计算该特征点对应的待测局部描述子与地图点1对应的样本局部描述子之间的距离1,并计算该待测局部描述子与地图点2对应的样本局部描述子之间的距离2,并计算该待测局部描述子与地图点3对应的样本局部描述子之间的距离3。For example, assuming that the candidate sample image corresponds to map point 1, map point 2, and map point 3, the distance 1 between the local descriptor to be tested corresponding to the feature point and the sample local descriptor corresponding to map point 1 can be calculated, and Calculate the distance 2 between the local descriptor to be tested and the sample local descriptor corresponding to map point 2, and calculate the distance 3 between the local descriptor to be tested and the sample local descriptor corresponding to map point 3.
在一种可能的实施方式中,若距离1是最小距离,则可以将地图点1选取为目标地图点。或者,若距离1小于距离阈值(可以根据经验进行配置),且距离2小于距离阈值,但是距离3不小于距离阈值,则可以将地图点1和地图点2均选取为目标地图点。或者,若距离1是最小距离,且距离1小于距离阈值,则可以将地图点1选取为目标地图点,但是,若距离1是最小距离,且距离1不小于距离阈值,则无法选取出目标地图点,即重定位失败。In a possible implementation manner, if the distance 1 is the minimum distance, map point 1 may be selected as the target map point. Alternatively, if distance 1 is less than the distance threshold (can be configured based on experience), and distance 2 is less than the distance threshold, but distance 3 is not less than the distance threshold, map point 1 and map point 2 can both be selected as target map points. Or, if distance 1 is the minimum distance and distance 1 is less than the distance threshold, map point 1 can be selected as the target map point; however, if distance 1 is the minimum distance and distance 1 is not less than the distance threshold, the target cannot be selected map point, i.e. relocation failed.
综上所述,针对待测图像的每个特征点,从待测图像对应的候选样本图像中选取与该特征点对应的目标地图点,得到特征点与目标地图点的匹配关系。To sum up, for each feature point of the image to be tested, the target map point corresponding to the feature point is selected from the candidate sample images corresponding to the image to be tested, and the matching relationship between the feature point and the target map point is obtained.
步骤408、基于待测图像对应的多个特征点以及多个特征点对应的目标地图点,确定该待测图像对应的三维视觉地图中的全局定位位姿。 Step 408 , based on the multiple feature points corresponding to the image to be tested and the target map points corresponding to the multiple feature points, determine the global positioning pose in the 3D visual map corresponding to the image to be tested.
针对一帧待测图像来说,该待测图像可以对应多个特征点,且每个特征点对应一个目标地图点。比如说,特征点1对应的目标地图点是地图点1,特征点2对应的目标地图点是地图点2,以此类推,从而得到多个匹配关系对。每个匹配关系对包括一个特征点(即二维特征点)和一个地图点(即三维视觉地图中的三维地图点),该特征点表示待测图像中的二维位置,该地图点表示三维视觉地图中的三维位置,即匹配关系对包括二维位置到三维位置的映射关系,即待测图像中的二维位置到三维视觉地图中的三维位置的映射关系。For a frame of an image to be tested, the image to be tested may correspond to multiple feature points, and each feature point corresponds to a target map point. For example, the target map point corresponding to feature point 1 is map point 1, the target map point corresponding to feature point 2 is map point 2, and so on, so as to obtain multiple matching relationship pairs. Each matching relationship pair includes a feature point (that is, a two-dimensional feature point) and a map point (that is, a three-dimensional map point in a three-dimensional visual map), the feature point represents the two-dimensional position in the image to be tested, and the map point represents the three-dimensional The three-dimensional position in the visual map, that is, the matching relation pair includes the mapping relationship from the two-dimensional position to the three-dimensional position, that is, the mapping relationship from the two-dimensional position in the image to be tested to the three-dimensional position in the three-dimensional visual map.
若多个匹配关系对的总数量未达到数量要求,则表示无法基于多个匹配关系对确定出该待测图像对应的三维视觉地图中的全局定位位姿。若多个匹配关系对的总数量达到数量要求(即总数量达到预设数量值),则表示能够基于多个匹配关系对确定出该待测图像对应的三维视觉地图中的全局定位位姿,可以基于多个匹配关系对确定出该待测图像对应的三维视觉地图中的全局定位位姿。If the total number of multiple matching relationship pairs does not meet the quantity requirement, it means that the global positioning pose in the three-dimensional visual map corresponding to the image to be tested cannot be determined based on the multiple matching relationship pairs. If the total number of multiple matching relationship pairs reaches the quantity requirement (that is, the total number reaches a preset number value), it means that the global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs, The global positioning pose in the three-dimensional visual map corresponding to the image to be tested can be determined based on multiple matching relationship pairs.
比如说,可以采用PnP(Perspective n Point,n点透视)算法计算出该待测图像对应的三维视觉地图中的全局定位位姿,对此计算方式不做限制。例如,PnP算法的输入数据是多个匹配关系对,针对每个匹配关系对,该匹配关系对包括待测图像中的二维位置和三维视觉地图中的三维位置,基于多个匹配关系对就可以采用PnP算法计算出待测图像在三维视觉地图中的位姿,即全局定位位姿。For example, the PnP (Perspective n Point, n-point perspective) algorithm can be used to calculate the global positioning pose in the three-dimensional visual map corresponding to the image to be tested, and the calculation method is not limited. For example, the input data of the PnP algorithm is a plurality of matching relationship pairs. For each matching relationship pair, the matching relationship pair includes the two-dimensional position in the image to be tested and the three-dimensional position in the three-dimensional visual map. The PnP algorithm can be used to calculate the pose of the image to be tested in the three-dimensional visual map, that is, the global positioning pose.
综上所述,针对每帧待测图像,得到该待测图像对应的三维视觉地图中的全局定位位姿,即得到该待测图像在三维视觉地图坐标系中对应的全局定位位姿。To sum up, for each frame of the image to be tested, the global positioning pose in the 3D visual map corresponding to the image to be tested is obtained, that is, the global positioning pose corresponding to the image to be tested in the 3D visual map coordinate system is obtained.
在一种可能的实施方式中,在得到多个匹配关系对之后,还可以从多个匹配关系对中查找到有效的匹配关系对。基于这些有效的匹配关系对,则可以采用PnP算法计算出该待测图像对应的三维视觉地图中的全局定位位姿。比如说,可以采用RANSAC(RANdom SAmple Consensus,随机样本一致性)检测算法,从所有匹配关系对中找到有效的匹配关系对,对此过程不做限制。In a possible implementation manner, after obtaining multiple matching relationship pairs, a valid matching relationship pair may also be found from the multiple matching relationship pairs. Based on these effective matching relationship pairs, the PnP algorithm can be used to calculate the global positioning pose in the 3D visual map corresponding to the image to be tested. For example, the RANSAC (RANdom SAmple Consensus, random sample consistency) detection algorithm can be used to find a valid matching relationship pair from all matching relationship pairs, and there is no limit to this process.
步骤409、基于步骤401获取的M帧待测图像对应的全局定位位姿生成终端设备在三维视觉地图中的全局定位轨迹,该全局定位轨迹包括三维视觉地图中的多个全局定位位姿。至此,服务器可以得到三维视觉地图中的全局定位轨迹,也就是三维视觉地图坐标系中的全局定位轨迹,该全局定位轨迹可以包括M帧待测图像对应的全局定位位姿,即该全局定位轨迹可以包括M个全局定位位姿。由于M帧待测图像是从所有图像中选取的部分图像,因此,全局定位轨迹可以包括少量图像对应的全局定位位姿,即,服务器能够得到低帧率的全局定位轨迹。Step 409 : Generate a global positioning track of the terminal device in the 3D visual map based on the global positioning poses corresponding to the M frames of images to be tested acquired in step 401 , the global positioning track includes multiple global positioning poses in the 3D visual map. So far, the server can obtain the global positioning trajectory in the 3D visual map, that is, the global positioning trajectory in the coordinate system of the 3D visual map. The global positioning trajectory can include the global positioning poses corresponding to the M frames of images to be tested, that is, the global positioning trajectory May include M global positioning poses. Since the M frames of images to be tested are partial images selected from all images, the global positioning track may include global positioning poses corresponding to a small number of images, that is, the server can obtain a global positioning track with a low frame rate.
五、服务器的融合定位。服务器得到高帧率的自定位轨迹和低帧率的全局定位轨迹之后,将高帧率的自定位轨迹与低帧率的全局定位轨迹进行融合,得到三维视觉地图坐标系下的高帧率的融合定位轨迹,即终端设备在三维视觉地图中的融合定位轨迹。融合定位轨迹是三维视觉地图中的高帧率位姿,全局定位轨迹是三维视觉地图中的低帧率位姿,即融合定位轨迹的帧率高于全局定位轨迹的帧率,融合定位位姿数量大于全局定位位姿数量。5. Integration and positioning of servers. After the server obtains the high frame rate self-positioning trajectory and the low frame rate global positioning trajectory, it fuses the high frame rate self-positioning trajectory with the low frame rate global positioning trajectory to obtain the high frame rate trajectory in the 3D visual map coordinate system The fused positioning trajectory, that is, the fused positioning trajectory of the terminal device in the 3D visual map. The fusion positioning trajectory is a high frame rate pose in the 3D visual map, and the global positioning trajectory is a low frame rate pose in the 3D visual map, that is, the frame rate of the fusion positioning trajectory is higher than the frame rate of the global positioning trajectory, and the fusion positioning pose The number is greater than the number of global positioning poses.
参见图5所示,白色实线圆表示自定位位姿,将多个自定位位姿组成的轨迹称为自定位轨迹,即自定位轨迹包括多个自定位位姿。第一帧图像对应的自定位位姿可以是参考坐标系S L(即自定位坐标系)的坐标原点,将第一帧图像对应的自定位位姿记为
Figure PCTCN2022131134-appb-000001
自定位位姿
Figure PCTCN2022131134-appb-000002
与参考坐标系S L的坐标原点重合。针对自定位轨迹中的每个自定位位姿,是在参考坐标系S L下的自定位位姿。
Referring to FIG. 5 , the white solid circle represents a self-localization pose, and a trajectory composed of multiple self-localization poses is called a self-localization trajectory, that is, a self-localization trajectory includes multiple self-localization poses. The self-localization pose corresponding to the first frame image can be the coordinate origin of the reference coordinate system SL (that is, the self-localization coordinate system), and the self-localization pose corresponding to the first frame image is recorded as
Figure PCTCN2022131134-appb-000001
self-positioning pose
Figure PCTCN2022131134-appb-000002
It coincides with the coordinate origin of the reference coordinate system S L. For each self-localization pose in the self-localization trajectory, is the self-localization pose in the reference frame S L .
灰色实线圆表示全局定位位姿,将多个全局定位位姿组成的轨迹称为全局定位轨迹,即全局定位轨迹包括多个全局定位位姿,全局定位位姿可以是三维视觉地图坐标系S G下的位姿,即全局定位轨迹中的每个全局定位位姿均是三维视觉地图坐标系S G下的全局定位位姿,也即三维视觉地图下的全局定位位姿。 The gray solid line circle represents the global positioning pose. The trajectory composed of multiple global positioning poses is called the global positioning trajectory, that is, the global positioning trajectory includes multiple global positioning poses. The global positioning pose can be the three-dimensional visual map coordinate system S The pose under G , that is, each global positioning pose in the global positioning trajectory is the global positioning pose under the 3D visual map coordinate system S G , that is, the global positioning pose under the 3D visual map.
白色虚线圆表示融合定位位姿,将多个融合定位位姿组成的轨迹称为融合定位轨迹,即融合定位轨迹包括多个融合定位位姿,融合定位位姿可以是三维视觉地图坐标系S G下的位姿,即融合定位轨迹中的每个融合定位位姿均是三维视觉地图坐标系S G下的融合定位位姿,也即三维视觉地图下的融合定位位姿。 The white dotted circle represents the fusion positioning pose. The trajectory composed of multiple fusion positioning poses is called the fusion positioning trajectory, that is, the fusion positioning trajectory includes multiple fusion positioning poses. The fusion positioning pose can be the three-dimensional visual map coordinate system S G The pose under , that is, each fusion positioning pose in the fusion positioning trajectory is the fusion positioning pose under the three-dimensional visual map coordinate system S G , that is, the fusion positioning pose under the three-dimensional visual map.
参见图5所示,由于目标图像包括多帧图像,每帧图像对应一个自定位位姿,并从多帧图像中选取部分帧图像作为待测图像,每帧待测图像对应一个全局定位位姿,因此,自定位位姿的数量大于全局定位位姿的数量。在基于自定位轨迹和全局定位轨迹得到融合定位轨迹时,每个自定位位姿对应一个融合定位位姿(即自定位位姿与融合定位位姿一一对应),即自定位位姿的数量与融合定位位姿的数量相同,因此,融合定位位姿的数量也大于全局定位位姿的数量。As shown in Figure 5, since the target image includes multiple frames of images, each frame of image corresponds to a self-positioning pose, and a part of frame images is selected from the multiple frames of images as the image to be tested, and each frame of the image to be tested corresponds to a global positioning pose , so the number of self-localization poses is larger than the number of global localization poses. When the fused positioning trajectory is obtained based on the self-localization trajectory and the global positioning trajectory, each self-localization pose corresponds to a fusion positioning pose (that is, the self-localization pose and the fusion positioning pose correspond one-to-one), that is, the number of self-localization poses The number of fused localization poses is the same as that of fused localization poses, therefore, the number of fused localization poses is also larger than the number of global localization poses.
在一种可能的实施方式中,服务器可以实现轨迹融合功能和位姿变换功能,参见图6所示, 服务器可以采用如下步骤实现轨迹融合功能和位姿变换功能,以得到终端设备在该三维视觉地图中的融合定位轨迹:In a possible implementation manner, the server can implement the trajectory fusion function and the pose transformation function. Referring to FIG. 6, the server can implement the trajectory fusion function and the pose transformation function through the following steps to obtain the Fusion positioning track in the map:
步骤601、从自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从全局定位轨迹包括的所有全局定位位姿中选取出与目标时间段对应的P个全局定位位姿,示例性的,N可以大于P。Step 601: Select N self-localization poses corresponding to the target time period from all self-localization poses included in the self-localization trajectory, and select N self-localization poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory The P global positioning poses, for example, N may be greater than P.
比如说,在对目标时间段的自定位轨迹和全局定位轨迹进行融合时,可以确定目标时间段对应的N个自定位位姿(即基于目标时间段采集的图像确定的自定位位姿),并确定目标时间段对应的P个全局定位位姿(即基于目标时间段采集的图像确定的全局定位位姿),参见图5所示,可以将
Figure PCTCN2022131134-appb-000003
Figure PCTCN2022131134-appb-000004
之间的自定位位姿作为目标时间段对应的N个自定位位姿,可以将
Figure PCTCN2022131134-appb-000005
Figure PCTCN2022131134-appb-000006
之间的全局定位位姿作为目标时间段对应的P个全局定位位姿。
For example, when the self-localization trajectory and the global positioning trajectory of the target time period are fused, N self-localization poses corresponding to the target time period (that is, self-localization poses determined based on images collected during the target time period) can be determined, And determine the P global positioning poses corresponding to the target time period (that is, the global positioning pose determined based on the images collected in the target time period), as shown in Figure 5, you can use
Figure PCTCN2022131134-appb-000003
and
Figure PCTCN2022131134-appb-000004
The self-localization poses between are used as the N self-localization poses corresponding to the target time period, which can be
Figure PCTCN2022131134-appb-000005
and
Figure PCTCN2022131134-appb-000006
The global positioning poses between are used as the P global positioning poses corresponding to the target time period.
步骤602、基于N个自定位位姿和P个全局定位位姿确定N个自定位位姿对应的N个融合定位位姿,N个自定位位姿与N个融合定位位姿一一对应。Step 602: Determine N fused positioning poses corresponding to the N self-localization poses and P global positioning poses based on the N self-localization poses and the P global positioning poses, and the N self-localization poses correspond to the N fused positioning poses one-to-one.
比如说,参见图5所示,可以基于N个自定位位姿和P个全局定位位姿,确定自定位位姿
Figure PCTCN2022131134-appb-000007
对应的融合定位位姿
Figure PCTCN2022131134-appb-000008
确定自定位位姿
Figure PCTCN2022131134-appb-000009
对应的融合定位位姿
Figure PCTCN2022131134-appb-000010
确定自定位位姿
Figure PCTCN2022131134-appb-000011
对应的融合定位位姿
Figure PCTCN2022131134-appb-000012
以此类推。
For example, as shown in Figure 5, the self-positioning pose can be determined based on N self-positioning poses and P global positioning poses
Figure PCTCN2022131134-appb-000007
Corresponding fusion positioning pose
Figure PCTCN2022131134-appb-000008
Determining the self-localization pose
Figure PCTCN2022131134-appb-000009
Corresponding fusion positioning pose
Figure PCTCN2022131134-appb-000010
Determining the self-localization pose
Figure PCTCN2022131134-appb-000011
Corresponding fusion positioning pose
Figure PCTCN2022131134-appb-000012
and so on.
在一种可能的实施方式中,假设存在N个自定位位姿、P个全局定位位姿和N个融合定位位姿,N个自定位位姿均为已知值,P个全局定位位姿均为已知值,N个融合定位位姿均是未知值,是需要求解的位姿值。如图5所示,自定位位姿
Figure PCTCN2022131134-appb-000013
与融合定位位姿
Figure PCTCN2022131134-appb-000014
对应,自定位位姿
Figure PCTCN2022131134-appb-000015
与融合定位位姿
Figure PCTCN2022131134-appb-000016
对应,自定位位姿
Figure PCTCN2022131134-appb-000017
与融合定位位姿
Figure PCTCN2022131134-appb-000018
对应,以此类推。全局定位位姿
Figure PCTCN2022131134-appb-000019
与融合定位位姿
Figure PCTCN2022131134-appb-000020
对应,全局定位位姿
Figure PCTCN2022131134-appb-000021
与融合定位位姿
Figure PCTCN2022131134-appb-000022
对应,以此类推。
In a possible implementation, it is assumed that there are N self-localization poses, P global localization poses and N fusion localization poses, the N self-localization poses are all known values, and the P global localization poses All are known values, and the N fusion positioning poses are all unknown values, which are the pose values that need to be solved. As shown in Figure 5, the self-positioning pose
Figure PCTCN2022131134-appb-000013
and fusion localization pose
Figure PCTCN2022131134-appb-000014
Corresponding, self-positioning pose
Figure PCTCN2022131134-appb-000015
and fusion localization pose
Figure PCTCN2022131134-appb-000016
Corresponding, self-positioning pose
Figure PCTCN2022131134-appb-000017
and fusion localization pose
Figure PCTCN2022131134-appb-000018
correspond, and so on. global positioning pose
Figure PCTCN2022131134-appb-000019
and fusion localization pose
Figure PCTCN2022131134-appb-000020
Correspondence, global positioning pose
Figure PCTCN2022131134-appb-000021
and fusion localization pose
Figure PCTCN2022131134-appb-000022
correspond, and so on.
可以基于N个自定位位姿和N个融合定位位姿确定第一约束值,第一约束值用于表示融合定位位姿与自定位位姿之间的残差值,如可以基于
Figure PCTCN2022131134-appb-000023
Figure PCTCN2022131134-appb-000024
的差值、
Figure PCTCN2022131134-appb-000025
Figure PCTCN2022131134-appb-000026
的差值、…、
Figure PCTCN2022131134-appb-000027
Figure PCTCN2022131134-appb-000028
的差值,计算第一约束值。关于第一约束值的计算公式,本实施例中不做限制,与上述各差值有关即可。
The first constraint value can be determined based on the N self-positioning poses and N fusion positioning poses, and the first constraint value is used to represent the residual value between the fusion positioning pose and the self-localization pose, such as based on
Figure PCTCN2022131134-appb-000023
and
Figure PCTCN2022131134-appb-000024
difference,
Figure PCTCN2022131134-appb-000025
and
Figure PCTCN2022131134-appb-000026
difference, ...,
Figure PCTCN2022131134-appb-000027
and
Figure PCTCN2022131134-appb-000028
Calculate the first constraint value. The calculation formula of the first constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
可以基于P个全局定位位姿和P个融合定位位姿(即从N个融合定位位姿中选取出与P个全局定位位姿对应的P个融合定位位姿)确定第二约束值,第二约束值用于表示融合定位位姿与全局定位位姿之间的残差值(可以为绝对差值),如可以基于
Figure PCTCN2022131134-appb-000029
Figure PCTCN2022131134-appb-000030
的差值、…、
Figure PCTCN2022131134-appb-000031
Figure PCTCN2022131134-appb-000032
的差值,计算第二约束值。关于第二约束值的计算公式,本实施例中不做限制,与上述各差值有关即可。
The second constraint value can be determined based on P global positioning poses and P fusion positioning poses (that is, P fusion positioning poses corresponding to P global positioning poses are selected from N fusion positioning poses). The two constraint values are used to represent the residual value (which can be an absolute difference) between the fusion positioning pose and the global positioning pose, such as can be based on
Figure PCTCN2022131134-appb-000029
and
Figure PCTCN2022131134-appb-000030
difference, ...,
Figure PCTCN2022131134-appb-000031
and
Figure PCTCN2022131134-appb-000032
Calculate the second constraint value. The calculation formula of the second constraint value is not limited in this embodiment, and it only needs to be related to the above-mentioned differences.
可以基于第一约束值和第二约束值计算目标约束值,如目标约束值可以为第一约束值与第二约束值之和。由于N个自定位位姿和P个全局定位位姿均为已知值,N个融合定位位姿均是未知值,因此,通过调整N个融合定位位姿的取值,使得目标约束值为最小。当目标约束值为最小时,N个融合定位位姿的取值就是最终求解的位姿值,至此,得到N个融合定位位姿的取值。The target constraint value may be calculated based on the first constraint value and the second constraint value, for example, the target constraint value may be the sum of the first constraint value and the second constraint value. Since the N self-positioning poses and P global positioning poses are all known values, and the N fusion positioning poses are all unknown values, therefore, by adjusting the values of the N fusion positioning poses, the target constraint value is minimum. When the target constraint value is the minimum, the values of the N fusion positioning poses are the final solution pose values, so far, the values of the N fusion positioning poses are obtained.
在一种可能的实施方式中,可以采用公式(1)计算目标约束值:In a possible implementation manner, formula (1) can be used to calculate the target constraint value:
Figure PCTCN2022131134-appb-000033
Figure PCTCN2022131134-appb-000033
在公式(1)中,F(T)表示目标约束值,加号前面的部分(后续记为第一部分)为第一约束值,加号后面的部分(后续记为第二部分)为第二约束值,当然,上述只是目标约束值、第一约束值和第二约束值的示例,对此不做限制。In formula (1), F(T) represents the target constraint value, the part before the plus sign (subsequently recorded as the first part) is the first constraint value, and the part after the plus sign (subsequently recorded as the second part) is the second Constraint values, of course, the above are only examples of the target constraint value, the first constraint value and the second constraint value, and are not limited thereto.
Ω i,i+1是针对自定位位姿的残差信息矩阵,可以根据经验配置,对此不做限制,Ω k是针对全局定位位姿的残差信息矩阵,可以根据经验配置,对此不做限制。 Ω i, i+1 is the residual information matrix for the self-localization pose, which can be configured according to experience, and there is no restriction on this. Ω k is the residual information matrix for the global positioning pose, which can be configured according to experience. No restrictions.
第一部分表示自定位位姿与融合定位位姿的相对变换约束,可以通过第一约束值反映,N为自定位轨迹中的所有自定位位姿的数量,即N个自定位位姿。第二部分表示全局定位位姿与融合定位位姿的全局定位约束,可以通过第二约束值反映,P为全局定位轨迹中的所有全局定位位姿的数量,即P个全局定位位姿。The first part represents the relative transformation constraint between the self-localization pose and the fused localization pose, which can be reflected by the first constraint value. N is the number of all self-localization poses in the self-localization trajectory, that is, N self-localization poses. The second part represents the global positioning constraints of the global positioning pose and the fusion positioning pose, which can be reflected by the second constraint value. P is the number of all global positioning poses in the global positioning trajectory, that is, P global positioning poses.
针对第一部分中的e i,i+1和第二部分中的e k,还可以分别通过公式(2)和公式(3)表示: For e i, i+1 in the first part and e k in the second part, it can also be expressed by formula (2) and formula (3):
Figure PCTCN2022131134-appb-000034
Figure PCTCN2022131134-appb-000034
Figure PCTCN2022131134-appb-000035
Figure PCTCN2022131134-appb-000035
在公式(2)和公式(3)中,
Figure PCTCN2022131134-appb-000036
Figure PCTCN2022131134-appb-000037
为融合定位位姿(没有对应的全局定位位姿),
Figure PCTCN2022131134-appb-000038
Figure PCTCN2022131134-appb-000039
为自定位位姿,
Figure PCTCN2022131134-appb-000040
为两个自定位位姿之间的相对位姿变化约束,e i,i+1
Figure PCTCN2022131134-appb-000041
Figure PCTCN2022131134-appb-000042
相对位姿变化与
Figure PCTCN2022131134-appb-000043
约束之间的残差,I为一个单位矩阵。
In formula (2) and formula (3),
Figure PCTCN2022131134-appb-000036
and
Figure PCTCN2022131134-appb-000037
For the fusion positioning pose (there is no corresponding global positioning pose),
Figure PCTCN2022131134-appb-000038
and
Figure PCTCN2022131134-appb-000039
is the self-positioning pose,
Figure PCTCN2022131134-appb-000040
is the relative pose change constraint between two self-positioning poses, e i,i+1 is
Figure PCTCN2022131134-appb-000041
and
Figure PCTCN2022131134-appb-000042
The relative pose change and
Figure PCTCN2022131134-appb-000043
Residuals between constraints, I is an identity matrix.
Figure PCTCN2022131134-appb-000044
为融合定位位姿(有对应的全局定位位姿
Figure PCTCN2022131134-appb-000045
),
Figure PCTCN2022131134-appb-000046
Figure PCTCN2022131134-appb-000047
对应的全局定位位姿,e k表示融合定位位姿
Figure PCTCN2022131134-appb-000048
相对于全局定位位姿
Figure PCTCN2022131134-appb-000049
的残差。
Figure PCTCN2022131134-appb-000044
Positioning poses for fusion (with corresponding global positioning poses
Figure PCTCN2022131134-appb-000045
),
Figure PCTCN2022131134-appb-000046
for
Figure PCTCN2022131134-appb-000047
The corresponding global positioning pose, e k represents the fusion positioning pose
Figure PCTCN2022131134-appb-000048
relative to the global localization pose
Figure PCTCN2022131134-appb-000049
residuals.
由于自定位位姿和全局定位位姿为已知,融合定位位姿为未知,优化目标可以是使F(T)的取值为最小,从而可以得到融合定位位姿,即三维视觉地图坐标系下的融合定位轨迹可以参见公式(4):arg min F(T)通过使F(T)的取值为最小,就可以得到融合定位轨迹,且该融合定位轨迹可以包括多个融合定位位姿。Since the self-localization pose and the global positioning pose are known, but the fusion positioning pose is unknown, the optimization goal can be to minimize the value of F(T), so that the fusion positioning pose can be obtained, that is, the 3D visual map coordinate system The following fusion positioning trajectory can be referred to formula (4): arg min F(T) By minimizing the value of F(T), the fusion positioning trajectory can be obtained, and the fusion positioning trajectory can include multiple fusion positioning poses .
示例性的,为了使F(T)的取值为最小,可以采用高斯牛顿、梯度下降、LM(Levenberg-Marquardt)等算法求解,得到融合定位位姿,在此不再赘述。Exemplarily, in order to minimize the value of F(T), Gauss-Newton, gradient descent, LM (Levenberg-Marquardt) and other algorithms can be used to obtain the fusion positioning pose, which will not be repeated here.
步骤603、基于N个融合定位位姿生成终端设备在三维视觉地图中的融合定位轨迹,该融合定位轨迹包括三维视觉地图中的N个融合定位位姿。Step 603: Generate a fused positioning trajectory of the terminal device in the 3D visual map based on the N fused positioning poses, where the fused positioning trajectory includes the N fused positioning poses in the 3D visual map.
至此,服务器得到三维视觉地图中的融合定位轨迹,即三维视觉地图坐标系中的融合定位轨迹,该融合定位轨迹中的融合定位位姿的数量大于全局定位轨迹中的全局定位位姿的数量,也就是说,能够得到高帧率的融合定位轨迹。So far, the server obtains the fused positioning trajectory in the 3D visual map, that is, the fused positioning trajectory in the 3D visual map coordinate system, the number of fused positioning poses in the fused positioning trajectory is greater than the number of global positioning poses in the global positioning trajectory, That is to say, a fusion positioning track with a high frame rate can be obtained.
步骤604、从该融合定位轨迹中选取出初始融合定位位姿,并从该自定位轨迹中选取出与该初始融合定位位姿对应的初始自定位位姿。Step 604: Select an initial fusion positioning pose from the fusion positioning trajectory, and select an initial self-localization pose corresponding to the initial fusion positioning pose from the self-localization trajectory.
步骤605、从该自定位轨迹中选取出目标自定位位姿,并基于该初始融合定位位姿、该初始自定位位姿和该目标自定位位姿确定目标融合定位位姿。Step 605: Select a target self-localization pose from the self-localization track, and determine a target fusion positioning pose based on the initial fusion positioning pose, the initial self-localization pose, and the target self-localization pose.
示例性的,在生成该融合定位轨迹之后,还可以对该融合定位轨迹进行更新,在轨迹更新过程中,可以从该融合定位轨迹中选取出初始融合定位位姿,并从该自定位轨迹中选取出初始自定位位姿,并从该自定位轨迹中选取出目标自定位位姿。在此基础上,可以基于该初始融合定位位姿、该初始自定位位姿和该目标自定位位姿确定目标融合定位位姿。然后,可以基于该目标融合定位位姿和该融合定位轨迹生成新的融合定位轨迹,以替换原融合定位轨迹。Exemplarily, after the fusion positioning trajectory is generated, the fusion positioning trajectory can also be updated. During the trajectory update process, the initial fusion positioning pose can be selected from the fusion positioning trajectory, and the self-positioning trajectory can be The initial self-localization pose is selected, and the target self-localization pose is selected from the self-localization track. On this basis, the target fusion positioning pose can be determined based on the initial fusion positioning pose, the initial self-localization pose and the target self-localization pose. Then, a new fusion positioning trajectory may be generated based on the target fusion positioning pose and the fusion positioning trajectory to replace the original fusion positioning trajectory.
比如说,在步骤601-步骤603中,参见图5所示,自定位轨迹包括
Figure PCTCN2022131134-appb-000050
Figure PCTCN2022131134-appb-000051
之间的自定位位姿,全局定位轨迹包括
Figure PCTCN2022131134-appb-000052
Figure PCTCN2022131134-appb-000053
之间的全局定位位姿,融合定位轨迹包括
Figure PCTCN2022131134-appb-000054
Figure PCTCN2022131134-appb-000055
之间的融合定位位姿,在此之后,若得到新的自定位位姿
Figure PCTCN2022131134-appb-000056
但是,由于没有对应的全局定位位姿,则无法基于全局定位位姿和自定位位姿
Figure PCTCN2022131134-appb-000057
确定出自定位位姿
Figure PCTCN2022131134-appb-000058
对应的融合定位位姿
Figure PCTCN2022131134-appb-000059
在此基础上,本实施例中,还可以采用如下公式(4)确定融合定位位姿
Figure PCTCN2022131134-appb-000060
For example, in step 601-step 603, referring to Fig. 5, the self-positioning trajectory includes
Figure PCTCN2022131134-appb-000050
and
Figure PCTCN2022131134-appb-000051
Between the self-localization poses, the global localization trajectory includes
Figure PCTCN2022131134-appb-000052
and
Figure PCTCN2022131134-appb-000053
Between the global positioning poses, the fusion positioning trajectory includes
Figure PCTCN2022131134-appb-000054
and
Figure PCTCN2022131134-appb-000055
Between the fusion positioning pose, after that, if a new self-positioning pose is obtained
Figure PCTCN2022131134-appb-000056
However, since there is no corresponding global positioning pose, it is impossible to base on the global positioning pose and self-localization pose
Figure PCTCN2022131134-appb-000057
Determining the self-localization pose
Figure PCTCN2022131134-appb-000058
Corresponding fusion positioning pose
Figure PCTCN2022131134-appb-000059
On this basis, in this embodiment, the following formula (4) can also be used to determine the fusion positioning pose
Figure PCTCN2022131134-appb-000060
Figure PCTCN2022131134-appb-000061
Figure PCTCN2022131134-appb-000061
在公式(4)中,
Figure PCTCN2022131134-appb-000062
表示自定位位姿
Figure PCTCN2022131134-appb-000063
对应的融合定位位姿,即目标融合定位位姿,
Figure PCTCN2022131134-appb-000064
表示融合定位位姿,即从融合定位轨迹中选取的初始融合定位位姿,
Figure PCTCN2022131134-appb-000065
表示自定位位姿,即从自定位轨迹中选取的与
Figure PCTCN2022131134-appb-000066
对应的初始自定位位姿,
Figure PCTCN2022131134-appb-000067
表示自定位位姿,即从自定位轨迹中选取的目标自定位位姿。综上可以看出,可以基于该初始融合定位位姿
Figure PCTCN2022131134-appb-000068
该初始自定位位姿
Figure PCTCN2022131134-appb-000069
和该目标自定位位姿
Figure PCTCN2022131134-appb-000070
确定目标融合定位位姿
Figure PCTCN2022131134-appb-000071
In formula (4),
Figure PCTCN2022131134-appb-000062
represents the self-localization pose
Figure PCTCN2022131134-appb-000063
The corresponding fusion positioning pose, that is, the target fusion positioning pose,
Figure PCTCN2022131134-appb-000064
Indicates the fusion positioning pose, that is, the initial fusion positioning pose selected from the fusion positioning trajectory,
Figure PCTCN2022131134-appb-000065
Indicates the self-localization pose, which is selected from the self-localization trajectory and
Figure PCTCN2022131134-appb-000066
The corresponding initial self-localization pose,
Figure PCTCN2022131134-appb-000067
Indicates the self-localization pose, that is, the target self-localization pose selected from the self-localization trajectory. In summary, it can be seen that the pose can be positioned based on the initial fusion
Figure PCTCN2022131134-appb-000068
The initial self-localization pose
Figure PCTCN2022131134-appb-000069
and the target self-localization pose
Figure PCTCN2022131134-appb-000070
Determining the target fusion positioning pose
Figure PCTCN2022131134-appb-000071
在得到目标融合定位位姿
Figure PCTCN2022131134-appb-000072
之后,可以生成新的融合定位轨迹,即新的融合定位轨迹可以包括目标融合定位位姿
Figure PCTCN2022131134-appb-000073
从而对融合定位轨迹进行更新。
After obtaining the target fusion positioning pose
Figure PCTCN2022131134-appb-000072
After that, a new fusion positioning trajectory can be generated, that is, the new fusion positioning trajectory can include the target fusion positioning pose
Figure PCTCN2022131134-appb-000073
In this way, the fusion positioning trajectory is updated.
在上述过程中,步骤601-步骤603是轨迹融合过程,步骤604-步骤605是位姿变换过程,轨迹融合是将自定位轨迹与全局定位轨迹进行配准融合的过程,实现自定位轨迹从自定位坐标系到三维视觉地图坐标系的转换,利用全局定位结果对轨迹进行修正,当新的一帧能够获得全局定位轨迹,进行一次轨迹融合。由于不是所有的帧都能够成功获得全局定位轨迹,因此,这些帧对应的位姿是通过位姿变换的方式输出至三维视觉地图坐标系的融合定位位姿,即位姿变换过程。In the above process, step 601-step 603 is the trajectory fusion process, step 604-step 605 is the pose transformation process, and trajectory fusion is the process of registering and fusing the self-positioning trajectory and the global positioning trajectory, so as to realize the self-positioning trajectory from the self-positioning trajectory The transformation from the positioning coordinate system to the 3D visual map coordinate system uses the global positioning results to correct the trajectory. When a new frame can obtain the global positioning trajectory, a trajectory fusion is performed. Since not all frames can successfully obtain global positioning trajectories, the poses corresponding to these frames are output to the fused positioning pose of the 3D visual map coordinate system through pose transformation, that is, the pose transformation process.
六、目标场景的三维可视化地图。需要预先构建目标场景的三维可视化地图,并将三维可视化地图存储到服务器,服务器就可以基于该三维可视化地图进行轨迹展示。三维可视化地图是目标场景的3D可视化地图,主要用于轨迹展示,可以通过激光扫描和人工建模得到。三维可视化地图是可以查看的可视化地图,例如可以采用构图算法得到,本申请对此三维可视化地图的构建方式不做限制。6. The 3D visualization map of the target scene. It is necessary to pre-build a 3D visualization map of the target scene and store the 3D visualization map in the server, and the server can display the trajectory based on the 3D visualization map. The 3D visualization map is a 3D visualization map of the target scene, which is mainly used for trajectory display and can be obtained through laser scanning and manual modeling. The 3D visualized map is a viewable visualized map, for example, it can be obtained by using a composition algorithm, and this application does not limit the construction method of the 3D visualized map.
基于目标场景的三维视觉地图和目标场景的三维可视化地图,需要对三维视觉地图和三维可视化地图进行配准,保证三维视觉地图和三维可视化地图在空间上对齐。比如说,对三维可视化地图进行采样,将三维可视化地图由三角面片形式变为稠密点云形式,并利用该点云与三维视觉地图的3D点云通过ICP(Iterative Closest Point,迭代最近点)算法进行配准,得到三维可视化地图到三维视觉地图的变换矩阵T;最后,使用变换矩阵T将三维可视化地图变换至三维视觉地图坐标系下,得到与三维视觉地图对齐的三维可视化地图。Based on the 3D visual map of the target scene and the 3D visual map of the target scene, it is necessary to register the 3D visual map and the 3D visual map to ensure that the 3D visual map and the 3D visual map are aligned in space. For example, sampling the 3D visual map, changing the 3D visual map from a triangular patch form to a dense point cloud form, and using the point cloud and the 3D point cloud of the 3D visual map to pass ICP (Iterative Closest Point, Iterative Closest Point) The algorithm performs registration to obtain the transformation matrix T from the 3D visual map to the 3D visual map; finally, the transformation matrix T is used to transform the 3D visual map into the 3D visual map coordinate system, and a 3D visual map aligned with the 3D visual map is obtained.
示例性的,可以采用如下方式确定变换矩阵T(记为目标变换矩阵):Exemplarily, the transformation matrix T (referred to as the target transformation matrix) can be determined in the following manner:
方式1、在构建三维视觉地图和三维可视化地图时,可以在目标场景部署多个标定点(可以通过不同形状区分不同标定点,从而能够从图像中识别出标定点),三维视觉地图可以包括多个标定点,三维可视化地图也可以包括多个标定点。针对多个标定点中的每个标定点,可以确定该标定点对应的坐标对,该坐标对包括该标定点在三维视觉地图中的位置坐标和该标定点在三维可视化地图中的位置坐标。基于多个标定点对应的坐标对就可以确定该目标变换矩阵。比如说,目标变换矩阵T可以是m*n维的变换矩阵,三维视觉地图和三维可视化地图的变换关系可以是:W=Q*T,W表示三维可视化地图中的位置坐标,Q表示三维视觉地图中的位置坐标,那么,将多个标定点对应的多个坐标对代入上述公式(即标定点在三维视觉地图中的位置坐标作为Q,标定点在三维可视化地图中的位置坐标作为W),可以得到目标变换矩阵T,对此过程不再赘述。 Method 1. When constructing a 3D visual map and a 3D visualization map, multiple calibration points can be deployed in the target scene (different calibration points can be distinguished by different shapes, so that the calibration points can be identified from the image), and the 3D visual map can include multiple A calibrated point, the 3D visualization map can also include multiple calibrated points. For each of the multiple calibration points, a coordinate pair corresponding to the calibration point can be determined, and the coordinate pair includes the position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map. The target transformation matrix can be determined based on the coordinate pairs corresponding to the multiple calibration points. For example, the target transformation matrix T can be an m*n-dimensional transformation matrix, and the transformation relationship between the 3D visual map and the 3D visual map can be: W=Q*T, W represents the position coordinates in the 3D visual map, and Q represents the 3D visual map position coordinates in the map, then, a plurality of coordinate pairs corresponding to a plurality of calibration points are substituted into the above formula (that is, the position coordinates of the calibration points in the three-dimensional visual map are used as Q, and the position coordinates of the calibration points in the three-dimensional visual map are used as W) , the target transformation matrix T can be obtained, and the process will not be repeated here.
方式2、获取初始变换矩阵,基于该初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标,基于该映射坐标与三维可视化地图中的实际坐标的关系确定该初始变换矩阵是否已收敛;若是,则可以将该初始变换矩阵确定为目标变换矩阵,即得到目标变换矩阵;若否,则可以对该初始变换矩阵进行调整,并将调整后变换矩阵作为初始变换矩阵,然后,返回执行基于该初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标的操作,以此类推,一直到得到目标变换矩阵。Method 2. Obtain an initial transformation matrix, map the position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determine the initial transformation matrix based on the relationship between the mapped coordinates and the actual coordinates in the 3D visual map Whether it has converged; if so, the initial transformation matrix can be determined as the target transformation matrix, that is, the target transformation matrix is obtained; if not, the initial transformation matrix can be adjusted, and the adjusted transformation matrix can be used as the initial transformation matrix, and then , returning to perform the operation of mapping the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map based on the initial transformation matrix, and so on until the target transformation matrix is obtained.
比如说,可以先获取一个初始变换矩阵,对此初始变换矩阵的获取方式不做限制,可以是随机设置的初始变换矩阵,也可以采用某种算法得到的初始变换矩阵,这个初始变换矩阵是需要迭代优化的矩阵,即,不断对初始变换矩阵进行迭代优化,将迭代优化后的初始变换矩阵作为目标变换矩阵。For example, an initial transformation matrix can be obtained first, and there is no restriction on the method of obtaining the initial transformation matrix. It can be an initial transformation matrix set randomly, or an initial transformation matrix obtained by a certain algorithm. This initial transformation matrix is required The iteratively optimized matrix, that is, iteratively optimizes the initial transformation matrix continuously, and uses the iteratively optimized initial transformation matrix as the target transformation matrix.
在得到初始变换矩阵之后,就可以基于初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标,比如说,三维视觉地图和三维可视化地图的变换关系可以是:W=Q*T,也就是说,将三维视觉地图中的位置坐标作为Q,将初始变换矩阵作为T,就可以得到三维可视化地图中的位置坐标(为了区分方便,将其记为映射坐标)。然后,基于三维可视化地图中的映射坐标与三维可视化地图中的实际坐标的关系确定初始变换矩阵是否已收敛。例如,三维可视化地图中的映射坐标是基于初始变换矩阵转换后的坐标,三维可视化地图中的实际坐标是三维可视化地图中的真实坐标,即,三维视觉地图中的位置坐标在三维可视化地图中对应的真实坐标,当映射坐标与实际坐标的差值越小时,就表示初始变换矩阵的准确性越高,当映射坐标与实际坐标的差值越大时,就表示初始变换矩阵的准确性越差。基于上述原理,就可以基于映射坐标与实际坐标的差值确定初始变换矩阵是否已收敛。After obtaining the initial transformation matrix, the position coordinates in the 3D visual map can be mapped to the mapping coordinates in the 3D visual map based on the initial transformation matrix. For example, the transformation relationship between the 3D visual map and the 3D visual map can be: W=Q *T, that is to say, taking the position coordinates in the 3D visual map as Q and the initial transformation matrix as T, the position coordinates in the 3D visual map can be obtained (for the convenience of distinction, they are recorded as mapping coordinates). Then, it is determined whether the initial transformation matrix has converged based on the relationship between the mapped coordinates in the three-dimensional visualization map and the actual coordinates in the three-dimensional visualization map. For example, the mapping coordinates in the 3D visual map are transformed coordinates based on the initial transformation matrix, and the actual coordinates in the 3D visual map are the real coordinates in the 3D visual map, that is, the position coordinates in the 3D visual map correspond to When the difference between the mapped coordinates and the actual coordinates is smaller, it means that the accuracy of the initial transformation matrix is higher. When the difference between the mapped coordinates and the actual coordinates is larger, it means that the accuracy of the initial transformation matrix is worse. . Based on the above principles, it can be determined based on the difference between the mapped coordinates and the actual coordinates whether the initial transformation matrix has converged.
例如,若映射坐标与实际坐标的差值(可以是多组差值之和,每组差值对应一个映射坐标与实际坐标的差值)小于阈值,则确定初始变换矩阵已收敛,若映射坐标与实际坐标的差值不小于阈值,则确定初始变换矩阵未收敛。For example, if the difference between the mapped coordinates and the actual coordinates (it can be the sum of multiple sets of differences, and each set of differences corresponds to a difference between the mapped coordinates and the actual coordinates) is smaller than the threshold, it is determined that the initial transformation matrix has converged. If the mapped coordinates If the difference from the actual coordinates is not less than the threshold, it is determined that the initial transformation matrix has not converged.
若初始变换矩阵未收敛,则可以对该初始变换矩阵进行调整,对此调整过程不做限制,如采用ICP算法对初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于初始变换矩阵将三维视觉地图中的位置坐标映射为三维可视化地图中的映射坐标的操作,以此类推,一直到得到目标变换矩阵。若初始变换矩阵已收敛,则将该初始变换矩阵确定为目标变换矩阵。If the initial transformation matrix has not converged, the initial transformation matrix can be adjusted, and there is no restriction on the adjustment process. For example, the ICP algorithm is used to adjust the initial transformation matrix, and the adjusted transformation matrix is used as the initial transformation matrix, and the return execution is based on the initial The transformation matrix maps the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map, and so on until the target transformation matrix is obtained. If the initial transformation matrix has converged, the initial transformation matrix is determined as the target transformation matrix.
方式3、对三维可视化地图进行采样,得到与三维可视化地图对应的第一点云;对三维视觉地图进行采样,得到与三维视觉地图对应的第二点云。采用ICP算法对第一点云和第二点云进行配准,得到三维视觉地图与三维可视化地图之间的目标变换矩阵。显然,由于可以得到第一点云和第二点云,第一点云包括大量3D点,第二点云包括大量3D点,基于第一点云的大量3D点和第二点云的大量3D点,就可以采用ICP算法进行配准,对此配准过程不做限制。Mode 3: Sampling the 3D visual map to obtain a first point cloud corresponding to the 3D visual map; sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map. The ICP algorithm is used to register the first point cloud and the second point cloud, and the target transformation matrix between the 3D visual map and the 3D visual map is obtained. Obviously, since the first point cloud and the second point cloud can be obtained, the first point cloud includes a large number of 3D points, the second point cloud includes a large number of 3D points, based on a large number of 3D points of the first point cloud and a large number of 3D points of the second point cloud point, the ICP algorithm can be used for registration, and the registration process is not limited.
七、轨迹显示。服务器得到融合定位轨迹之后,针对融合定位轨迹中的每个融合定位位姿,服务器可以基于三维视觉地图与三维可视化地图之间的目标变换矩阵,将该融合定位位姿转换为三维可视化地图中的目标定位位姿,并通过三维可视化地图显示该目标定位位姿。在此基础上,管理人员可以打开Web浏览器,通过网络访问服务器,从而查看三维可视化地图中显示的目标定位位姿,这些目标定位位姿组成一条轨迹。服务器通过读取三维可视化地图并渲染,就能够将终端设备的目标定位位姿显示到三维可视化地图中,以使管理人员查看三维可视化地图中显示的目标定位位姿。管理人员可以通过鼠标拖动改变查看视角,实现轨迹的3D查看。比如说,服务器包括客户端软件,客户端软件读取三维可视化地图并进行渲染,并将目标定位位姿显示到三维可视化地图。在此基础上,用户(如管理人员)可以通过Web浏览器访问客户端软件,以通过客户端软件查看三维可视化地图中显示的目标定位位姿。示例性的,在通过客户端软件查看三维可视化地图中显示的目标定位位姿时,可以通过鼠标拖动改变三维可视化地图的查看视角。7. Track display. After the server obtains the fusion positioning trajectory, for each fusion positioning pose in the fusion positioning trajectory, the server can convert the fusion positioning pose into a 3D visualization map based on the target transformation matrix between the 3D visual map and the 3D visualization map Target positioning pose, and display the target positioning pose through a three-dimensional visualization map. On this basis, the manager can open a web browser and access the server through the network to view the target positioning poses displayed in the three-dimensional visualization map, and these target positioning poses form a trajectory. By reading and rendering the three-dimensional visual map, the server can display the target positioning pose of the terminal device on the three-dimensional visual map, so that managers can view the target positioning pose displayed on the three-dimensional visual map. Managers can change the viewing angle by dragging the mouse to realize 3D viewing of the track. For example, the server includes client software, and the client software reads and renders the 3D visualization map, and displays the target positioning pose on the 3D visualization map. On this basis, a user (such as a manager) can access the client software through a web browser, so as to view the target positioning pose displayed in the three-dimensional visualization map through the client software. Exemplarily, when viewing the target positioning pose displayed in the three-dimensional visualization map through the client software, the viewing angle of the three-dimensional visualization map can be changed by dragging the mouse.
由以上技术方案可见,本申请实施例中,提出一种云边结合的定位及显示方法,由终端设备计算高帧率的自定位轨迹,仅发送自定位轨迹和少量待测图像,减少网络传输的数据量。在服务器进行全局定位,从而减少终端设备的计算资源消耗和存储资源消耗。采用云边融合的***架构,能够分摊计算压力,减少终端设备的硬件成本,减少网络传输数据量。最终的定位结果能够在三维的三维可视化地图中进行显示,管理人员通过Web端访问服务器进行交互显示。It can be seen from the above technical solutions that in the embodiment of this application, a positioning and display method combining cloud and edge is proposed, and the terminal device calculates the self-positioning trajectory with a high frame rate, and only sends the self-positioning trajectory and a small number of images to be tested, reducing network transmission amount of data. Global positioning is performed on the server, thereby reducing the consumption of computing resources and storage resources of terminal devices. The system architecture of cloud-edge integration can share the computing pressure, reduce the hardware cost of terminal equipment, and reduce the amount of network transmission data. The final positioning result can be displayed on a 3D visualization map, and the management personnel access the server through the web terminal for interactive display.
基于与上述方法同样的申请构思,本申请实施例中提出一种云边管理***,所述云边管理***包括终端设备和服务器,所述服务器包括目标场景的三维视觉地图。所述终端设备用于在目标场景中移动的过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分帧图像作为待测图像,将所述待测图像和所述自定位轨迹发送给所述服务器。所述服务器用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。Based on the same application concept as the above method, an embodiment of the present application proposes a cloud edge management system, the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of a target scene. The terminal device is used to acquire a target image of the target scene and motion data of the terminal device during the process of moving in the target scene, and determine a self-positioning trajectory of the terminal device based on the target image and the motion data ; If the target image includes multiple frames of images, select a part of frame images from the multiple frames of images as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server. The server is configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory includes a plurality of fusion positioning poses; for the Fusing each fused positioning pose in the fused positioning trajectory, determining a target positioning pose corresponding to the fused positioning pose, and displaying the target positioning pose.
示例性的,所述终端设备包括视觉传感器和运动传感器;其中,所述视觉传感器用于获取所述目标场景的所述目标图像,所述运动传感器用于获取所述终端设备的所述运动数据。所述终端设备为可穿戴设备,且所述视觉传感器和所述运动传感器部署在所述可穿戴设备上;或者,所述终端设备为记录仪,且所述视觉传感器和所述运动传感器部署在所述记录仪上;或者,所述终端设备为摄像机,且所述视觉传感器和所述运动传感器部署在所述摄像机上。Exemplarily, the terminal device includes a visual sensor and a motion sensor; wherein, the visual sensor is used to obtain the target image of the target scene, and the motion sensor is used to obtain the motion data of the terminal device . The terminal device is a wearable device, and the visual sensor and the motion sensor are deployed on the wearable device; or, the terminal device is a recorder, and the visual sensor and the motion sensor are deployed on the on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are deployed on the camera.
示例性的,所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Exemplarily, when the server generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine from the three-dimensional visual map The target map point corresponding to the image to be tested, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal based on the self-positioning track and the global positioning track The fused positioning trajectory of the device in the three-dimensional visual map; wherein, the frame rate of the fused positioning pose included in the fused positioning trajectory is greater than the frame rate of the global positioning pose included in the global positioning trajectory; the fused positioning trajectory The frame rate of the included fused localization pose is equal to the frame rate of the self-localization pose included in the self-localization trajectory.
示例性的,所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;其中,所述服务器包括客户端软件,所述客户端软件读取所述三维可视化地图并进行渲染,并将所述目标定位位姿显示到所述三维可视化地图;其中,用户通过Web浏览器访问所述客户端软件,以通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿;其中,在通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿时,通过鼠标拖动改变所述三维可视化地图的查看视角。Exemplarily, the server determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose is specifically used to: based on the target transformation matrix between the 3D visual map and the 3D visual map , converting the fusion positioning pose into the target positioning pose in the 3D visualization map, and displaying the target positioning pose through the 3D visualization map; wherein, the server includes client software, and the client The terminal software reads and renders the three-dimensional visualization map, and displays the target positioning pose on the three-dimensional visualization map; wherein, the user accesses the client software through a Web browser to pass the client software View the target positioning pose displayed in the three-dimensional visual map; wherein, when viewing the target positioning pose displayed in the three-dimensional visual map through the client software, drag the mouse to change the three-dimensional The viewing angle of the visualized map.
基于与上述方法同样的申请构思,本申请实施例中提出一种位姿显示装置,应用于云边管理***中的服务器,所述服务器包括目标场景的三维视觉地图,参见图7所示,为所述位姿显示装置的结构图。所述位姿显示装置包括:获取模块71,用于获取待测图像和自定位轨迹;其中,所述自定位轨迹由终端设备基于所述目标场景的目标图像和所述终端设备的运动数据确定,所述待测图像是所述目标图像包括的多帧图像中的部分帧图像;生成模块72,用于基于所述待测图像和所述自定 位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;显示模块73,用于针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。Based on the same application concept as the above method, a pose display device is proposed in the embodiment of the present application, which is applied to the server in the cloud edge management system, and the server includes a three-dimensional visual map of the target scene, as shown in FIG. 7 , which is Structural diagram of the pose display device. The pose display device includes: an acquisition module 71, configured to acquire an image to be tested and a self-positioning trajectory; wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device , the image to be tested is a partial frame image in the multi-frame images included in the target image; a generation module 72, configured to generate the terminal device in the three-dimensional vision based on the image to be tested and the self-positioning trajectory The fusion positioning track in the map, the fusion positioning track includes a plurality of fusion positioning poses; the display module 73 is used to determine the fusion positioning pose corresponding to each fusion positioning pose in the fusion positioning track. target location pose, and display the target location pose.
示例性的,所述生成模块72基于待测图像和自定位轨迹生成终端设备在三维视觉地图中的融合定位轨迹时具体用于:从三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;融合定位轨迹包括的融合定位位姿的帧率大于全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Exemplarily, when the generation module 72 generates the fused positioning trajectory of the terminal device in the 3D visual map based on the image to be tested and the self-positioning trajectory, it is specifically used to: determine the target corresponding to the image to be tested from the 3D visual map map point, determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point; generating the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track The fusion positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is greater than the frame rate of the global positioning pose included in the global positioning track; the frame rate of the fusion positioning pose included in the fusion positioning track is equal to the self-positioning The frame rate of the self-localization pose included in the trajectory.
示例性的,所述三维视觉地图包括以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息;所述生成模块72从三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定终端设备在三维视觉地图中的全局定位轨迹时具体用于:针对每帧待测图像,基于待测图像与三维视觉地图对应的多帧样本图像之间的相似度,从多帧样本图像中选取出候选样本图像;从待测图像中获取多个特征点;针对每个特征点,从候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点;基于所述多个特征点和所述多个特征点对应的目标地图点确定所述待测图像对应的三维视觉地图中的全局定位位姿;基于所有待测图像对应的全局定位位姿生成所述终端设备在所述三维视觉地图中的全局定位轨迹。Exemplarily, the three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to a feature point in the sample image, and map point information; The generation module 72 determines the target map point corresponding to the image to be tested from the three-dimensional visual map, and when determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point, it is specifically used to: for each frame to be tested Based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, candidate sample images are selected from the multi-frame sample images; multiple feature points are obtained from the image to be tested; for each feature point, determine the target map point corresponding to the feature point from the multiple map points corresponding to the candidate sample image; determine the image to be tested based on the multiple feature points and the target map points corresponding to the multiple feature points The corresponding global positioning pose in the three-dimensional visual map; generating a global positioning track of the terminal device in the three-dimensional visual map based on the global positioning poses corresponding to all images to be tested.
示例性的,所述生成模块72基于所述待测图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像时具体用于:确定所述待测图像对应的待测全局描述子,确定所述待测全局描述子与所述三维视觉地图对应的每帧样本图像对应的样本全局描述子之间的距离;基于所述待测全局描述子与每个样本全局描述子之间的距离,从所述多帧样本图像中选取出所述候选样本图像;其中,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离为所述待测全局描述子与各个样本全局描述子之间的距离中的最小距离,和/或,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离小于距离阈值。Exemplarily, the generation module 72 is based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, when selecting candidate sample images from the multi-frame sample images, it is specifically used to : Determine the global descriptor to be tested corresponding to the image to be tested, and determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of the sample image corresponding to the three-dimensional visual map; Measure the distance between the global descriptor and each sample global descriptor, and select the candidate sample image from the multi-frame sample images; wherein, the sample corresponding to the global descriptor to be tested and the candidate sample image The distance between the global descriptors is the minimum distance among the distances between the global descriptor to be tested and each sample global descriptor, and/or, the sample corresponding to the global descriptor to be tested and the candidate sample image The distance between global descriptors is less than the distance threshold.
示例性的,所述生成模块72确定所述待测图像对应的待测全局描述子时具体用于:基于已训练的字典模型确定所述待测图像对应的词袋向量,将所述词袋向量确定为所述待测图像对应的所述待测全局描述子;或,将所述待测图像输入给已训练的深度学习模型,得到所述待测图像对应的目标向量,将所述目标向量确定为所述待测图像对应的所述待测全局描述子。Exemplarily, when the generation module 72 determines the global descriptor to be tested corresponding to the image to be tested, it is specifically used to: determine the bag-of-words vector corresponding to the image-to-be-tested based on the trained dictionary model, and convert the bag-of-words vector to The vector is determined as the global descriptor to be tested corresponding to the image to be tested; or, the image to be tested is input to a trained deep learning model to obtain a target vector corresponding to the image to be tested, and the target The vector is determined as the global descriptor to be tested corresponding to the image to be tested.
示例性的,所述生成模块72从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点时具体用于:确定该特征点对应的待测局部描述子,所述待测局部描述子用于表示该特征点所处图像块的特征向量,且所述图像块位于所述待测图像中;确定所述待测局部描述子与所述候选样本图像对应的每个地图点对应的样本局部描述子之间的距离;基于所述待测局部描述子与每个样本局部描述子之间的距离,从所述候选样本图像对应的多个地图点中选取所述目标地图点;其中,所述待测局部描述子与所述目标地图点对应的样本局部描述子之间的距离为所述待测局部描述子与所述候选样本图像对应的各个地图点对应的样本局部描述子之间的距离中的最小距离,和/或,所述待测局部描述子与所述目标地图点对应的样本局部描述子之间的距离小于距离阈值。Exemplarily, when the generation module 72 determines the target map point corresponding to the feature point from the plurality of map points corresponding to the candidate sample image, it is specifically used to: determine the local descriptor to be measured corresponding to the feature point, The local descriptor to be tested is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the image to be tested; determine the local descriptor to be tested corresponding to the candidate sample image The distance between the sample local descriptors corresponding to each map point; based on the distance between the local descriptor to be tested and each sample local descriptor, select the selected map point from the plurality of map points corresponding to the candidate sample image The target map point; wherein, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is that the local descriptor to be tested corresponds to each map point corresponding to the candidate sample image The minimum distance among the distances between the sample local descriptors, and/or, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is smaller than a distance threshold.
示例性的,所述生成模块72基于自定位轨迹和全局定位轨迹生成所述终端设备在三维视觉地图中的融合定位轨迹时具体用于:从自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从所述全局定位轨迹包括的所有全局定位位姿中选取出与所述目标时间段对应的P个全局定位位姿;N大于P;基于所述N个自定位位姿和所述P个全局定位位姿确定N个自定位位姿对应的N个融合定位位姿,N个自定位位姿与N个融合定位位姿一一对应;基于N个融合定位位姿生成所述终端设备在三维视觉地图中的融合定位轨迹。Exemplarily, when the generation module 72 generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory, it is specifically used to: select from all self-positioning poses included in the self-positioning trajectory N self-positioning poses corresponding to the target time period, and selecting P global positioning poses corresponding to the target time period from all global positioning poses included in the global positioning trajectory; N is greater than P; based on The N self-positioning poses and the P global positioning poses determine N fusion positioning poses corresponding to the N self-positioning poses, and the N self-positioning poses are in one-to-one correspondence with the N fusion positioning poses; A fused positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the N fused positioning poses.
示例性的,所述显示模块73确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;其中,所述显示模块73还用于采用如下方式确定三维视觉地图与三维可视化地图之间的目标变换矩阵:针对所述目标场景中的多个标定点中的每个标定点,确定该标定点对应的坐标对,所述坐标对包括该标定点在三维视觉地图中的位置坐标和该标定点在所述三维可视化地图中的位置坐标;基于所述多个标定点对应的坐标对确定所述目标变换矩阵;或者,获取初始变换矩阵,基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标,基于该映射坐标与所述三维可视化地图中的实际坐标的关系确定所述初始变换矩阵是否已收敛;若是,则将所述初始变换矩阵确定为所述目标变换矩阵;若否,则对所述初始变换矩阵进行调整,将调整 后变换矩阵作为初始变换矩阵,返回执行基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标的操作;或者,对所述三维可视化地图进行采样,得到与所述三维可视化地图对应的第一点云;以及,对所述三维视觉地图进行采样,得到与所述三维视觉地图对应的第二点云;采用迭代最近点ICP算法对所述第一点云和所述第二点云进行配准,得到所述三维视觉地图与三维可视化地图之间的目标变换矩阵。Exemplarily, the display module 73 determines the target positioning pose corresponding to the fused positioning pose, and when displaying the target positioning pose, is specifically used for: based on the target between the 3D visual map and the 3D visual map transformation matrix, converting the fused positioning pose into the target positioning pose in the three-dimensional visualization map, and displaying the target positioning pose through the three-dimensional visualization map; wherein, the display module 73 is also used to adopt The target transformation matrix between the three-dimensional visual map and the three-dimensional visual map is determined in the following manner: for each of the multiple marked points in the target scene, a coordinate pair corresponding to the marked point is determined, and the coordinate pair includes the The position coordinates of the calibration point in the three-dimensional visual map and the position coordinates of the calibration point in the three-dimensional visual map; determine the target transformation matrix based on the coordinate pairs corresponding to the plurality of calibration points; or, obtain the initial transformation matrix, mapping position coordinates in the 3D visual map to mapping coordinates in the 3D visual map based on the initial transformation matrix, and determining the initial transformation matrix based on the relationship between the mapped coordinates and actual coordinates in the 3D visual map Whether it has converged; if so, then determine the initial transformation matrix as the target transformation matrix; if not, then adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, and return to execute based on the initial transformation matrix The transformation matrix maps the position coordinates in the three-dimensional visual map to the mapping coordinates in the three-dimensional visual map; or, sampling the three-dimensional visual map to obtain a first point cloud corresponding to the three-dimensional visual map and, sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map; using an iterative closest point ICP algorithm to register the first point cloud and the second point cloud, A target transformation matrix between the 3D visual map and the 3D visual map is obtained.
基于与上述方法同样的申请构思,本申请实施例中提出一种服务器,所述服务器可以包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;所述处理器用于执行机器可执行指令,以实现本申请上述示例公开的位姿显示方法。Based on the same application concept as the above-mentioned method, a server is proposed in the embodiment of the present application. The server may include: a processor and a machine-readable storage medium, and the machine-readable storage medium stores information that can be executed by the processor. machine-executable instructions; the processor is configured to execute the machine-executable instructions to implement the pose display method disclosed in the above examples of the present application.
基于与上述方法同样的申请构思,本申请实施例还提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被处理器执行时,能够实现本申请上述示例公开的位姿显示方法。Based on the same application idea as the above-mentioned method, the embodiment of the present application also provides a machine-readable storage medium, on which several computer instructions are stored, and when the computer instructions are executed by a processor, the present invention can be realized. Apply the pose display method disclosed in the above example.
其中,上述机器可读存储介质可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。Wherein, the above-mentioned machine-readable storage medium may be any electronic, magnetic, optical or other physical storage device, which may contain or store information, such as executable instructions, data, and so on. For example, the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid state drive, any type of storage disk (such as CD, DVD, etc.), or similar storage media, or a combination of them.
上述实施例阐明的***、装置、模块或单元,具体可以由实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments may be implemented by entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing the present application, the functions of each unit can be implemented in one or more pieces of software and/or hardware.
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims (19)

  1. 一种位姿显示方法,其特征在于,应用于云边管理***,所述云边管理***包括终端设备和服务器,服务器包括目标场景的三维视觉地图,所述方法包括:A pose display method, characterized in that it is applied to a cloud edge management system, the cloud edge management system includes a terminal device and a server, the server includes a three-dimensional visual map of a target scene, and the method includes:
    所述终端设备在目标场景中移动的过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分帧图像作为待测图像,将所述待测图像和所述自定位轨迹发送给所述服务器;During the process of the terminal device moving in the target scene, acquiring a target image of the target scene and motion data of the terminal device, and determining a self-positioning trajectory of the terminal device based on the target image and the motion data; if The target image includes a multi-frame image, then select a part of the frame image from the multi-frame image as the image to be tested, and send the image to be tested and the self-positioning track to the server;
    所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;The server generates a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory includes a plurality of fusion positioning poses;
    针对所述融合定位轨迹中的每个融合定位位姿,所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。For each fused positioning pose in the fused positioning trajectory, the server determines a target positioning pose corresponding to the fused positioning pose, and displays the target positioning pose.
  2. 根据权利要求1所述的方法,其特征在于,所述终端设备基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹,包括:The method according to claim 1, wherein the terminal device determines the self-positioning trajectory of the terminal device based on the target image and the motion data, comprising:
    所述终端设备从所述多帧图像中遍历出当前帧图像;Traversing the current frame image from the multi-frame images by the terminal device;
    基于所述当前帧图像前面的K帧图像的每帧图像对应的自定位位姿、所述终端设备在自定位坐标系中的地图位置和所述运动数据确定当前帧图像对应的自定位位姿,其中K为正整数;Determine the self-positioning pose corresponding to the current frame image based on the self-positioning pose corresponding to each frame image of the K frame image in front of the current frame image, the map position of the terminal device in the self-positioning coordinate system, and the motion data , where K is a positive integer;
    基于所述多帧图像的每帧图像对应的自定位位姿生成所述终端设备在所述自定位坐标系中的自定位轨迹;generating a self-positioning trajectory of the terminal device in the self-positioning coordinate system based on the self-positioning pose corresponding to each frame of the multi-frame image;
    其中,若所述当前帧图像是关键图像,则基于所述终端设备的当前位置生成所述终端设备在所述自定位坐标系中的地图位置;其中,若当前帧图像与所述当前帧图像的前一帧图像之间的匹配特征点数量未达到预设阈值,则确定所述当前帧图像是关键图像。Wherein, if the current frame image is a key image, the map position of the terminal device in the self-positioning coordinate system is generated based on the current position of the terminal device; wherein, if the current frame image and the current frame image If the number of matching feature points between the previous frame images does not reach a preset threshold, it is determined that the current frame image is a key image.
  3. 根据权利要求1所述的方法,其特征在于,所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,包括:The method according to claim 1, wherein the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning track, comprising:
    从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;Determining a target map point corresponding to the image-to-be-tested from the three-dimensional visual map, and determining a global positioning track of the terminal device in the three-dimensional visual map based on the target map point;
    基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory; wherein, the frame rate of the fusion positioning pose included in the fusion positioning trajectory is greater than that of the global positioning The frame rate of the global positioning pose included in the track; the frame rate of the fused positioning pose included in the fusion positioning track is equal to the frame rate of the self positioning pose included in the self positioning track.
  4. 根据权利要求3所述的方法,其特征在于,所述三维视觉地图包括以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息;所述服务器从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹,包括:The method according to claim 3, wherein the three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, and a sample corresponding to a feature point in the sample image local descriptor, map point information; the server determines the target map point corresponding to the image to be tested from the three-dimensional visual map, and determines that the terminal device is in the three-dimensional visual map based on the target map point The global positioning trajectory of , including:
    针对每帧待测图像,所述服务器基于所述待测图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像;For each frame of the image to be tested, the server selects candidate sample images from the multiple frames of sample images based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map;
    所述服务器从所述待测图像中获取多个特征点;The server obtains a plurality of feature points from the image to be tested;
    针对每个特征点,从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点;For each feature point, determine a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image;
    基于所述多个特征点和所述多个特征点对应的目标地图点确定所述待测图像对应的所述三维视觉地图中的全局定位位姿;determining a global positioning pose in the three-dimensional visual map corresponding to the image to be tested based on the plurality of feature points and target map points corresponding to the plurality of feature points;
    基于所有待测图像对应的所述全局定位位姿生成所述终端设备在所述三维视觉地图中的所述全局定位轨迹。The global positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the global positioning poses corresponding to all images to be tested.
  5. 根据权利要求4所述的方法,其特征在于,所述服务器基于所述待测图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像,包括:The method according to claim 4, wherein the server selects from the multi-frame sample images based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map Candidate sample images, including:
    确定所述待测图像对应的待测全局描述子,确定所述待测全局描述子与所述三维视觉地图对应的每帧样本图像对应的样本全局描述子之间的距离;Determine the global descriptor to be tested corresponding to the image to be tested, and determine the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of the sample image corresponding to the three-dimensional visual map;
    基于所述待测全局描述子与每个样本全局描述子之间的距离,从所述多帧样本图像中选取出所述候选样本图像;其中,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离为所述待测全局描述子与各个样本全局描述子之间的距离中的最小距离,和/或,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离小于距离阈值。Based on the distance between the global descriptor to be tested and each sample global descriptor, the candidate sample image is selected from the multi-frame sample images; wherein, the global descriptor to be tested and the candidate sample The distance between the sample global descriptors corresponding to the image is the smallest distance among the distances between the to-be-tested global descriptor and each sample global descriptor, and/or, the to-be-tested global descriptor and the candidate sample The distance between the sample global descriptors corresponding to the image is less than the distance threshold.
  6. 根据权利要求5所述的方法,其特征在于,所述确定所述待测图像对应的待测全局描述子包括:The method according to claim 5, wherein the determining the global descriptor to be tested corresponding to the image to be tested comprises:
    基于已训练的字典模型确定所述待测图像对应的词袋向量,将所述词袋向量确定为所述待测图 像对应的所述待测全局描述子;或,Determine the bag-of-words vector corresponding to the image to be tested based on the trained dictionary model, and determine the bag-of-words vector as the global descriptor to be tested corresponding to the image to be tested; or,
    将所述待测图像输入给已训练的深度学习模型,得到所述待测图像对应的目标向量,将所述目标向量确定为所述待测图像对应的所述待测全局描述子。Inputting the image to be tested into the trained deep learning model to obtain a target vector corresponding to the image to be tested, and determining the target vector as the global descriptor to be tested corresponding to the image to be tested.
  7. 根据权利要求4所述的方法,其特征在于,所述服务器从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点,包括:The method according to claim 4, wherein the server determines a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image, including:
    确定该特征点对应的待测局部描述子,所述待测局部描述子用于表示该特征点所处图像块的特征向量,且所述图像块位于所述待测图像中;Determine the local descriptor to be tested corresponding to the feature point, the local descriptor to be tested is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the image to be tested;
    确定所述待测局部描述子与所述候选样本图像对应的每个地图点对应的样本局部描述子之间的距离;determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each map point corresponding to the candidate sample image;
    基于所述待测局部描述子与每个样本局部描述子之间的距离,从所述候选样本图像对应的多个地图点中选取所述目标地图点;其中,所述待测局部描述子与所述目标地图点对应的样本局部描述子之间的距离为所述待测局部描述子与所述候选样本图像对应的各个地图点对应的样本局部描述子之间的距离中的最小距离,和/或,所述待测局部描述子与所述目标地图点对应的样本局部描述子之间的距离小于距离阈值。Based on the distance between the local descriptor to be tested and each sample local descriptor, select the target map point from a plurality of map points corresponding to the candidate sample image; wherein, the local descriptor to be tested and The distance between the sample local descriptors corresponding to the target map point is the minimum distance among the distances between the sample local descriptors corresponding to each map point corresponding to the candidate sample image and the local descriptor to be tested, and /or, the distance between the local descriptor to be tested and the sample local descriptor corresponding to the target map point is smaller than a distance threshold.
  8. 根据权利要求3所述的方法,其特征在于,所述服务器基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,包括:The method according to claim 3, wherein the server generates a fusion positioning track of the terminal device in the three-dimensional visual map based on the self-positioning track and the global positioning track, comprising:
    所述服务器从所述自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从所述全局定位轨迹包括的所有全局定位位姿中选取出与所述目标时间段对应的P个全局定位位姿;其中,N和P为正整数,且N大于P;The server selects N self-localization poses corresponding to the target time period from all self-localization poses included in the self-localization track, and selects N self-localization poses corresponding to the target time period from all global positioning poses included in the global positioning track. P global positioning poses corresponding to the target time period; wherein, N and P are positive integers, and N is greater than P;
    基于所述N个自定位位姿和所述P个全局定位位姿确定所述N个自定位位姿对应的N个融合定位位姿,所述N个自定位位姿与所述N个融合定位位姿一一对应;Determine N fusion positioning poses corresponding to the N self-positioning poses based on the N self-positioning poses and the P global positioning poses, the N self-positioning poses are fused with the N One-to-one correspondence between positioning poses;
    基于所述N个融合定位位姿生成所述终端设备在所述三维视觉地图中的所述融合定位轨迹。The fused positioning track of the terminal device in the 3D visual map is generated based on the N fused positioning poses.
  9. 根据权利要求1所述的方法,其特征在于,The method according to claim 1, characterized in that,
    所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿,包括:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;The server determines the target positioning pose corresponding to the fusion positioning pose, and displays the target positioning pose, including: based on the target transformation matrix between the 3D visual map and the 3D visualization map, the fusion positioning The pose is converted into the target positioning pose in the three-dimensional visualization map, and the target positioning pose is displayed through the three-dimensional visualization map;
    其中,所述三维视觉地图与所述三维可视化地图之间的目标变换矩阵的确定方式,包括:Wherein, the method for determining the target transformation matrix between the three-dimensional visual map and the three-dimensional visual map includes:
    针对所述目标场景中的多个标定点中的每个标定点,确定该标定点对应的坐标对,所述坐标对包括该标定点在所述三维视觉地图中的位置坐标和该标定点在所述三维可视化地图中的位置坐标;基于所述多个标定点对应的坐标对确定目标变换矩阵;For each of the multiple marking points in the target scene, determine a coordinate pair corresponding to the marking point, the coordinate pair including the position coordinates of the marking point in the three-dimensional visual map and the position coordinates of the marking point in the Position coordinates in the three-dimensional visualization map; determining a target transformation matrix based on coordinate pairs corresponding to the plurality of calibration points;
    或者,获取初始变换矩阵,基于初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标,基于该映射坐标与所述三维可视化地图中的实际坐标的关系确定所述初始变换矩阵是否已收敛;若是,则将所述初始变换矩阵确定为目标变换矩阵;若否,则对所述初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标的操作;Or, acquire an initial transformation matrix, map the position coordinates in the three-dimensional visual map to mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and determine based on the relationship between the mapping coordinates and the actual coordinates in the three-dimensional visual map Whether the initial transformation matrix has converged; if so, then determine the initial transformation matrix as the target transformation matrix; if not, then adjust the initial transformation matrix, use the adjusted transformation matrix as the initial transformation matrix, and return to execute based on an operation of mapping position coordinates in the three-dimensional visual map to mapped coordinates in the three-dimensional visual map by the initial transformation matrix;
    或者,对所述三维可视化地图进行采样,得到与所述三维可视化地图对应的第一点云;以及,对所述三维视觉地图进行采样,得到与所述三维视觉地图对应的第二点云;采用迭代最近点ICP算法对所述第一点云和所述第二点云进行配准,得到所述三维视觉地图与三维可视化地图之间的目标变换矩阵。Or, sampling the 3D visual map to obtain a first point cloud corresponding to the 3D visual map; and sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map; The iterative closest point ICP algorithm is used to register the first point cloud and the second point cloud to obtain a target transformation matrix between the 3D visual map and the 3D visual map.
  10. 一种云边管理***,其特征在于,所述云边管理***包括终端设备和服务器,所述服务器包括目标场景的三维视觉地图,其中:A cloud edge management system, characterized in that the cloud edge management system includes a terminal device and a server, and the server includes a three-dimensional visual map of a target scene, wherein:
    所述终端设备用于在目标场景中移动的过程中,获取所述目标场景的目标图像和所述终端设备的运动数据,基于所述目标图像和所述运动数据确定所述终端设备的自定位轨迹;若所述目标图像包括多帧图像,则从所述多帧图像中选取部分帧图像作为待测图像,将所述待测图像和所述自定位轨迹发送给所述服务器;The terminal device is used to acquire a target image of the target scene and motion data of the terminal device during the process of moving in the target scene, and determine a self-positioning trajectory of the terminal device based on the target image and the motion data ; If the target image includes a multi-frame image, select a part of the frame image from the multi-frame image as the image to be tested, and send the image to be tested and the self-positioning trajectory to the server;
    所述服务器用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The server is configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, and the fusion positioning trajectory includes a plurality of fusion positioning poses; for the Fusing each fused positioning pose in the fused positioning trajectory, determining a target positioning pose corresponding to the fused positioning pose, and displaying the target positioning pose.
  11. 根据权利要求10所述的***,其特征在于,所述终端设备包括视觉传感器和运动传感器;其中,所述视觉传感器用于获取所述目标场景的所述目标图像,所述运动传感器用于获取所述终端设备的所述运动数据;The system according to claim 10, wherein the terminal device includes a vision sensor and a motion sensor; wherein the vision sensor is used to acquire the target image of the target scene, and the motion sensor is used to acquire the motion data of the terminal device;
    其中,所述终端设备为可穿戴设备,且所述视觉传感器和所述运动传感器部署在所述可穿戴设备上;或者,所述终端设备为记录仪,且所述视觉传感器和所述运动传感器部署在所述记录仪上; 或者,所述终端设备为摄像机,且所述视觉传感器和所述运动传感器部署在所述摄像机上。Wherein, the terminal device is a wearable device, and the visual sensor and the motion sensor are deployed on the wearable device; or, the terminal device is a recorder, and the visual sensor and the motion sensor deployed on the recorder; or, the terminal device is a camera, and the vision sensor and the motion sensor are deployed on the camera.
  12. 根据权利要求10所述的***,其特征在于,所述服务器基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:The system according to claim 10, wherein the server is specifically used to:
    从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;Determining a target map point corresponding to the image-to-be-tested from the three-dimensional visual map, and determining a global positioning track of the terminal device in the three-dimensional visual map based on the target map point;
    基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率。Generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory; wherein, the frame rate of the fusion positioning pose included in the fusion positioning trajectory is greater than that of the global positioning The frame rate of the global positioning pose included in the track; the frame rate of the fused positioning pose included in the fusion positioning track is equal to the frame rate of the self positioning pose included in the self positioning track.
  13. 根据权利要求10所述的***,其特征在于,所述服务器确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:The system according to claim 10, wherein the server determines the target positioning pose corresponding to the fusion positioning pose, and when displaying the target positioning pose is specifically used for:
    基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;Based on the target transformation matrix between the 3D visual map and the 3D visual map, convert the fusion positioning pose into the target positioning pose in the 3D visual map, and display the target positioning through the 3D visual map pose;
    其中,所述服务器包括客户端软件,所述客户端软件读取所述三维可视化地图并进行渲染,并将所述目标定位位姿显示到所述三维可视化地图;Wherein, the server includes client software, and the client software reads and renders the three-dimensional visualization map, and displays the target positioning pose on the three-dimensional visualization map;
    其中,用户通过Web浏览器访问所述客户端软件,以通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿;Wherein, the user accesses the client software through a web browser, so as to view the target positioning pose displayed in the three-dimensional visualization map through the client software;
    其中,在通过所述客户端软件查看所述三维可视化地图中显示的所述目标定位位姿时,通过鼠标拖动改变所述三维可视化地图的查看视角。Wherein, when the target positioning pose displayed in the three-dimensional visualization map is viewed through the client software, the viewing angle of the three-dimensional visualization map is changed by dragging a mouse.
  14. 一种位姿显示装置,其特征在于,应用于云边管理***中的服务器,所述服务器包括目标场景的三维视觉地图,所述装置包括:A pose display device, characterized in that it is applied to a server in a cloud edge management system, the server includes a three-dimensional visual map of a target scene, and the device includes:
    获取模块,用于获取待测图像和自定位轨迹;其中,所述自定位轨迹由终端设备基于所述目标场景的目标图像和所述终端设备的运动数据确定,所述待测图像是所述目标图像包括的多帧图像中的部分帧图像;An acquisition module, configured to acquire an image to be tested and a self-positioning trajectory; wherein, the self-positioning trajectory is determined by the terminal device based on the target image of the target scene and the motion data of the terminal device, and the image to be tested is the Partial frame images in the multi-frame images included in the target image;
    生成模块,用于基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;A generating module, configured to generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the image-to-be-tested and the self-positioning trajectory, where the fusion positioning trajectory includes a plurality of fusion positioning poses;
    显示模块,用于针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。The display module is configured to, for each fused positioning pose in the fused positioning trajectory, determine a target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
  15. 根据权利要求14所述的装置,其特征在于,其中,所述生成模块基于所述待测图像和所述自定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:The device according to claim 14, wherein, when the generating module generates the fused positioning trajectory of the terminal device in the three-dimensional visual map based on the image to be tested and the self-positioning trajectory, it is specifically used At:
    从所述三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹;Determining a target map point corresponding to the image-to-be-tested from the three-dimensional visual map, and determining a global positioning track of the terminal device in the three-dimensional visual map based on the target map point;
    基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹;其中,所述融合定位轨迹包括的融合定位位姿的帧率大于所述全局定位轨迹包括的全局定位位姿的帧率;所述融合定位轨迹包括的融合定位位姿的帧率等于所述自定位轨迹包括的自定位位姿的帧率;Generate a fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory; wherein, the frame rate of the fusion positioning pose included in the fusion positioning trajectory is greater than that of the global positioning The frame rate of the global positioning pose included in the track; the frame rate of the fusion positioning pose included in the fusion positioning track is equal to the frame rate of the self positioning pose included in the self positioning track;
    其中,所述三维视觉地图包括以下至少一种:样本图像对应的位姿矩阵、样本图像对应的样本全局描述子、样本图像中的特征点对应的样本局部描述子、地图点信息;所述生成模块从三维视觉地图中确定出与所述待测图像对应的目标地图点,基于所述目标地图点确定所述终端设备在所述三维视觉地图中的全局定位轨迹时具体用于:Wherein, the three-dimensional visual map includes at least one of the following: a pose matrix corresponding to the sample image, a sample global descriptor corresponding to the sample image, a sample local descriptor corresponding to a feature point in the sample image, and map point information; the generating The module determines the target map point corresponding to the image to be tested from the three-dimensional visual map, and is specifically used when determining the global positioning track of the terminal device in the three-dimensional visual map based on the target map point:
    基于所述待测图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像;Based on the similarity between the image to be tested and the multi-frame sample images corresponding to the three-dimensional visual map, selecting candidate sample images from the multi-frame sample images;
    从所述待测图像中获取多个特征点;Obtaining multiple feature points from the image to be tested;
    针对每个特征点,从所述候选样本图像对应的多个地图点中确定出与该特征点对应的目标地图点;For each feature point, determine a target map point corresponding to the feature point from a plurality of map points corresponding to the candidate sample image;
    基于所述多个特征点和所述多个特征点对应的目标地图点确定所述待测图像对应的所述三维视觉地图中的全局定位位姿;determining a global positioning pose in the three-dimensional visual map corresponding to the image to be tested based on the plurality of feature points and target map points corresponding to the plurality of feature points;
    基于所有待测图像对应的所述全局定位位姿生成所述终端设备在所述三维视觉地图中的所述全局定位轨迹。The global positioning trajectory of the terminal device in the three-dimensional visual map is generated based on the global positioning poses corresponding to all images to be tested.
  16. 根据权利要求15所述的装置,其特征在于,所述生成模块基于所述自定位轨迹和所述全局定位轨迹生成所述终端设备在所述三维视觉地图中的融合定位轨迹时具体用于:The device according to claim 15, wherein the generation module is specifically used to generate the fusion positioning trajectory of the terminal device in the three-dimensional visual map based on the self-positioning trajectory and the global positioning trajectory:
    从所述自定位轨迹包括的所有自定位位姿中选取出与目标时间段对应的N个自定位位姿,并从所述全局定位轨迹包括的所有全局定位位姿中选取出与所述目标时间段对应的P个全局定位位姿; N和P为正整数,且N大于P;Select N self-localization poses corresponding to the target time period from all self-localization poses included in the self-localization track, and select N self-localization poses corresponding to the target time period from all global positioning poses included in the global positioning track P global positioning poses corresponding to the time period; N and P are positive integers, and N is greater than P;
    基于所述N个自定位位姿和所述P个全局定位位姿确定N个自定位位姿对应的N个融合定位位姿,所述N个自定位位姿与所述N个融合定位位姿一一对应;Determine N fusion positioning poses corresponding to the N self-positioning poses based on the N self-positioning poses and the P global positioning poses, the N self-positioning poses and the N fusion positioning poses one-to-one correspondence;
    基于所述N个融合定位位姿生成所述终端设备在所述三维视觉地图中的所述融合定位轨迹。The fused positioning track of the terminal device in the 3D visual map is generated based on the N fused positioning poses.
  17. 根据权利要求14所述的装置,其特征在于,The device according to claim 14, characterized in that,
    所述显示模块确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿时具体用于:基于所述三维视觉地图与三维可视化地图之间的目标变换矩阵,将所述融合定位位姿转换为所述三维可视化地图中的目标定位位姿,并通过所述三维可视化地图显示所述目标定位位姿;When the display module determines the target positioning pose corresponding to the fused positioning pose, and displays the target positioning pose, it is specifically used to: based on the target transformation matrix between the 3D visual map and the 3D visualization map, the The fusion positioning pose is converted into the target positioning pose in the three-dimensional visualization map, and the target positioning pose is displayed through the three-dimensional visualization map;
    其中,所述显示模块还用于采用如下方式确定三维视觉地图与三维可视化地图之间的目标变换矩阵:Wherein, the display module is also used to determine the target transformation matrix between the 3D visual map and the 3D visual map in the following manner:
    针对所述目标场景中的多个标定点中的每个标定点,确定该标定点对应的坐标对,所述坐标对包括该标定点在所述三维视觉地图中的位置坐标和该标定点在所述三维可视化地图中的位置坐标;基于所述多个标定点对应的坐标对确定所述目标变换矩阵;For each of the multiple marking points in the target scene, determine a coordinate pair corresponding to the marking point, the coordinate pair including the position coordinates of the marking point in the three-dimensional visual map and the position coordinates of the marking point in the Position coordinates in the three-dimensional visualization map; determining the target transformation matrix based on coordinate pairs corresponding to the plurality of calibration points;
    或者,获取初始变换矩阵,基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标,基于该映射坐标与所述三维可视化地图中的实际坐标的关系确定所述初始变换矩阵是否已收敛;若是,则将所述初始变换矩阵确定为所述目标变换矩阵;若否,则对所述初始变换矩阵进行调整,将调整后变换矩阵作为初始变换矩阵,返回执行基于所述初始变换矩阵将所述三维视觉地图中的位置坐标映射为所述三维可视化地图中的映射坐标的操作;Or, acquire an initial transformation matrix, map the position coordinates in the three-dimensional visual map to mapping coordinates in the three-dimensional visual map based on the initial transformation matrix, and based on the relationship between the mapping coordinates and the actual coordinates in the three-dimensional visual map Determine whether the initial transformation matrix has converged; if so, determine the initial transformation matrix as the target transformation matrix; if not, adjust the initial transformation matrix, and use the adjusted transformation matrix as the initial transformation matrix Returning to perform the operation of mapping the position coordinates in the 3D visual map to the mapping coordinates in the 3D visual map based on the initial transformation matrix;
    或者,对所述三维可视化地图进行采样,得到与所述三维可视化地图对应的第一点云;以及,对所述三维视觉地图进行采样,得到与所述三维视觉地图对应的第二点云;采用迭代最近点ICP算法对所述第一点云和所述第二点云进行配准,得到所述三维视觉地图与三维可视化地图之间的目标变换矩阵。Or, sampling the 3D visual map to obtain a first point cloud corresponding to the 3D visual map; and sampling the 3D visual map to obtain a second point cloud corresponding to the 3D visual map; The iterative closest point ICP algorithm is used to register the first point cloud and the second point cloud to obtain a target transformation matrix between the 3D visual map and the 3D visual map.
  18. 一种服务器,包括处理器和机器可读存储介质,其特征在于,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;所述处理器用于执行所述机器可执行指令以执行以下操作:A server, including a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions that can be executed by the processor; the processor is used to execute the machine-readable Execute the command to do the following:
    接收终端设备发送的待测图像和所述终端设备的自定位轨迹,其中,所述自定位轨迹是基于所述终端设备在目标场景中移动的过程中所获取的所述目标场景的目标图像和所述终端设备的运动数据确定的;所述待测图像为由所述终端设备从所述目标图像包括的多帧图像中选取的部分帧图像;receiving the image to be tested and the self-positioning trajectory of the terminal device sent by the terminal device, wherein the self-positioning trajectory is based on the target image of the target scene and the self-positioning trajectory acquired by the terminal device during the movement of the target scene determined by the motion data of the terminal device; the image to be tested is a partial frame image selected by the terminal device from the multi-frame images included in the target image;
    基于所述待测图像和所述自定位轨迹生成所述终端设备在三维视觉地图中的融合定位轨迹,所述融合定位轨迹包括多个融合定位位姿;Generate a fusion positioning trajectory of the terminal device in a three-dimensional visual map based on the image to be tested and the self-positioning trajectory, the fusion positioning trajectory includes a plurality of fusion positioning poses;
    针对所述融合定位轨迹中的每个融合定位位姿,确定与该融合定位位姿对应的目标定位位姿,并显示所述目标定位位姿。For each fused positioning pose in the fused positioning trajectory, determine a target positioning pose corresponding to the fused positioning pose, and display the target positioning pose.
  19. 一种机器可读存储介质,所述机器可读存储介质上存储有计算机指令,所述计算机指令被处理器执行时,能够实现根据权利要求1至9中任一项所述的位姿显示方法。A machine-readable storage medium, computer instructions are stored on the machine-readable storage medium, and when the computer instructions are executed by a processor, the pose display method according to any one of claims 1 to 9 can be realized .
PCT/CN2022/131134 2021-11-15 2022-11-10 Pose display method and apparatus, and system, server and storage medium WO2023083256A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111350621.9 2021-11-15
CN202111350621.9A CN114185073A (en) 2021-11-15 2021-11-15 Pose display method, device and system

Publications (1)

Publication Number Publication Date
WO2023083256A1 true WO2023083256A1 (en) 2023-05-19

Family

ID=80540921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131134 WO2023083256A1 (en) 2021-11-15 2022-11-10 Pose display method and apparatus, and system, server and storage medium

Country Status (2)

Country Link
CN (1) CN114185073A (en)
WO (1) WO2023083256A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system
CN117346650A (en) * 2022-06-28 2024-01-05 中兴通讯股份有限公司 Pose determination method and device for visual positioning and electronic equipment
CN118279385A (en) * 2022-12-30 2024-07-02 优奈柯恩(北京)科技有限公司 Method, apparatus, system, device, and medium for determining relative pose

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167814A1 (en) * 2006-12-01 2008-07-10 Supun Samarasekera Unified framework for precise vision-aided navigation
CN105143821A (en) * 2013-04-30 2015-12-09 高通股份有限公司 Wide area localization from SLAM maps
CN107818592A (en) * 2017-11-24 2018-03-20 北京华捷艾米科技有限公司 Method, system and the interactive system of collaborative synchronous superposition
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal
CN114120301A (en) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 Pose determination method, device and equipment
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167814A1 (en) * 2006-12-01 2008-07-10 Supun Samarasekera Unified framework for precise vision-aided navigation
CN105143821A (en) * 2013-04-30 2015-12-09 高通股份有限公司 Wide area localization from SLAM maps
CN107818592A (en) * 2017-11-24 2018-03-20 北京华捷艾米科技有限公司 Method, system and the interactive system of collaborative synchronous superposition
CN113382365A (en) * 2021-05-21 2021-09-10 北京索为云网科技有限公司 Pose tracking method and device of mobile terminal
CN114120301A (en) * 2021-11-15 2022-03-01 杭州海康威视数字技术股份有限公司 Pose determination method, device and equipment
CN114185073A (en) * 2021-11-15 2022-03-15 杭州海康威视数字技术股份有限公司 Pose display method, device and system

Also Published As

Publication number Publication date
CN114185073A (en) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2023083256A1 (en) Pose display method and apparatus, and system, server and storage medium
CN112567201B (en) Distance measuring method and device
US10134196B2 (en) Mobile augmented reality system
CN111081199B (en) Selecting a temporally distributed panoramic image for display
US9342927B2 (en) Augmented reality system for position identification
Chen et al. Rise of the indoor crowd: Reconstruction of building interior view via mobile crowdsourcing
CN110617821B (en) Positioning method, positioning device and storage medium
CN108700947A (en) For concurrent ranging and the system and method for building figure
CN110533719B (en) Augmented reality positioning method and device based on environment visual feature point identification technology
KR20150013709A (en) A system for mixing or compositing in real-time, computer generated 3d objects and a video feed from a film camera
US9551579B1 (en) Automatic connection of images using visual features
WO2023060964A1 (en) Calibration method and related apparatus, device, storage medium and computer program product
CN112288853A (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, and storage medium
CN114120301A (en) Pose determination method, device and equipment
CN112907557A (en) Road detection method, road detection device, computing equipment and storage medium
WO2023140990A1 (en) Visual inertial odometry with machine learning depth
Ma et al. Location and 3-D visual awareness-based dynamic texture updating for indoor 3-D model
Zhu et al. PairCon-SLAM: Distributed, online, and real-time RGBD-SLAM in large scenarios
CN116843754A (en) Visual positioning method and system based on multi-feature fusion
Huang et al. A new head pose tracking method based on stereo visual SLAM
Hu et al. Real-time camera localization with deep learning and sensor fusion
Liu et al. LSFB: A low-cost and scalable framework for building large-scale localization benchmark
Liu et al. HyperSight: Boosting distant 3D vision on a single dual-camera smartphone
WO2021111613A1 (en) Three-dimensional map creation device, three-dimensional map creation method, and three-dimensional map creation program
Porzi et al. An automatic image-to-DEM alignment approach for annotating mountains pictures on a smartphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892049

Country of ref document: EP

Kind code of ref document: A1

WD Withdrawal of designations after international publication