CN116989772A

CN116989772A - Air-ground multi-mode multi-agent cooperative positioning and mapping method

Info

Publication number: CN116989772A
Application number: CN202311248105.4A
Authority: CN
Inventors: 张金会; 魏嘉桐; 吕千一; 李思杭; 孟焕; 蔡吉山; 邵之玥; 赵凯
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2023-11-03
Anticipated expiration: 2043-09-26
Also published as: CN116989772B

Abstract

The application provides a method for collaborative positioning and mapping of air-ground multi-mode multi-agent, which comprises the steps of obtaining measurement data of an agent, wherein the agent comprises an unmanned aerial vehicle and an unmanned vehicle, and the unmanned vehicle is provided with a visual marker; carrying out local view angle local map building through measurement data of the unmanned aerial vehicle, and carrying out global view angle local map building through measurement data of the unmanned aerial vehicle; detecting a visual marker through the unmanned aerial vehicle, acquiring the relative pose between the unmanned aerial vehicle and the unmanned aerial vehicle when the detection is successful, and carrying out space-ground view map fusion and optimization by utilizing the conversion relation of the relative pose; continuously detecting whether the intelligent agent passes through the overlapping area based on the loop, and when the overlapping area is detected, establishing association of two maps of the intelligent agent through the matched key frames to perform track calibration and similar view map fusion and optimization; and obtaining the pose track and the map which are globally consistent according to the map obtained by the fusion and optimization of the air-ground view angle and the map obtained by the fusion and optimization of the similar view angles.

Description

Air-ground multi-mode multi-agent cooperative positioning and mapping method

Technical Field

The application belongs to the field of positioning and mapping of unmanned systems.

Background

With the continuous breakthrough of the technology in the field of intelligent robots, intelligent robots are increasingly applied to solve complex problems in modern society, and autonomous navigation capability is considered as a basis for the intelligent robots to realize autonomous tasks. In order to realize autonomous planning and navigation of robots, how to cooperatively position and build maps (simultaneous localizationand mapping, SLAM) of intelligent robots becomes a current research hot subject, and has great potential in the fields of military, disaster environment search and rescue and the like.

At present, mutual pose correction and information perception among multiple robots become a difficulty. In the SLAM process, the robot cannot acquire the information of the whole environment in advance, and the map is constructed mainly by the robot. The traditional single robot has many defects such as small sensor action range, limited observation angle, high calculation complexity, weak storage capacity and the like when performing SLAM in a large-scale scene. The robot positioning error is larger and larger only by means of local sensor information, and finally, the map building and positioning offset are caused. For some special tasks, such as military, disaster search and rescue, etc., it is necessary to complete the task in as short a time as possible. For the above problems, by a plurality of robots dispersed in the environment to communicate and work in coordination with each other, the entire system can obtain a stronger environment detection capability and an accurate positioning capability.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

Therefore, the application aims to provide a space-earth multi-mode multi-agent collaborative positioning and mapping method which is used for constructing an environment map model with the global consistency.

In order to achieve the above objective, an embodiment of a first aspect of the present application provides a method for collaborative positioning and mapping of air-ground multi-mode multi-agents, including:

acquiring measurement data of an intelligent agent, wherein the intelligent agent comprises an unmanned aerial vehicle and an unmanned aerial vehicle, and the unmanned aerial vehicle is provided with a visual marker;

local view angle local map building is carried out according to the measurement data of the unmanned aerial vehicle, and global view angle local map building is carried out according to the measurement data of the unmanned aerial vehicle;

detecting the visual marker through the unmanned aerial vehicle, acquiring the relative pose between the unmanned aerial vehicle and the unmanned aerial vehicle when the detection is successful, and performing space-ground view map fusion and optimization by utilizing the conversion relation of the relative pose;

continuously detecting whether the intelligent body passes through an overlapping area based on loop, and when the overlapping area is detected, establishing association of two maps of the intelligent body through matched key frames to perform track calibration and similar view map fusion and optimization;

and obtaining a map with the pose track and the global consistency according to the map subjected to the air-ground view angle fusion and optimization and the map subjected to the similar view angle fusion and optimization.

In addition, the air-ground multi-mode multi-agent cooperative positioning and mapping method according to the embodiment of the application can also have the following additional technical characteristics:

further, in an embodiment of the present application, the acquiring measurement data of the agent includes:

preprocessing the measurement data, including visual detection, optical flow tracking and inertial measurement unit IMU pre-integration; wherein,

the visual detection and optical flow tracking includes: the velocity vector of the optical flow is obtained by using the least square method, and the velocity vector of the optical flow is obtained by using the following formula,

，

wherein ,、/>representing that the brightness of pixel points in the image is +.>、/>Image gradient in direction, +.>Is indicated at->Time gradient in direction, +_>、/>For the optical flow edge->、/>A speed vector of the shaft;

the inertial measurement unit IMU pre-integration includes:

，

wherein ,for IMU coordinate system, +.>For the coordinate system of IMU at origin initialization, i.e. world coordinate system, < >> and />Is the acceleration and angular velocity measured by the IMU, < >>Is->Rotation of time of day from IMU coordinate system to world coordinate system,/->Right multiplication is performed on the quaternion;

will be the firstFrame to->Integrating all IMU data between frames to obtain the +.>Frame position->Speed->And rotation->，/>As an initial value of the visual estimation, the rotation is in the form of a quaternion.

Further, in an embodiment of the present application, the local view local mapping is performed by the measurement data of the unmanned aerial vehicle, and the global view local mapping is performed by the measurement data of the unmanned aerial vehicle, including:

solving three-dimensional positions of pose and all road mark points of all frames in a sliding window by using SFM, and aligning with an IMU pre-integration value to obtain angular velocity bias, gravity direction, scale factors and corresponding speeds of each frame;

optimizing state variables within a window using a sliding window approach, inOptimized state vector in time window +.>As described below,

，

wherein , and />For the rotation and translation part of the camera pose, < +.>For the speed of the camera in the world coordinate system, and />Acceleration bias and angular velocity bias of the IMU, respectively; the optimization objective function of the state quantity of the system, as follows,

，

wherein ,for maximum estimated posterior value, ++>For sliding window initial residual,/->Observing the residual error for the IMU,>residual errors are observed for the camera.

Further, in an embodiment of the present application, the detecting whether the agent passes through the overlapping area based on the loop includes:

clustering all visual features, wherein one type of feature is a word, and all the features are a dictionary;

describing an image with a single vector:

calculating similarity between two images A and B：

，

wherein ,is a vector describing image a, +.>Is a vector describing image B, +.>Is the similarity between the two images A and B, < >>Vector representing descriptive image A>Is>Component(s)>Representing vectors describing image BIs>A component;

and if the similarity exceeds a threshold value, considering that loop-back occurs.

Further, in an embodiment of the present application, the clustering all visual features includes:

at the root node, all samples are clustered using the K-means algorithmClass, obtain the first layer;

for each node of the first layer, re-aggregating samples belonging to that node into a plurality of nodesClass, obtaining the next layer; and so on, finally obtaining a leaf layer, wherein the leaf layer is a word.

Further, in one embodiment of the present application, the describing an image with a single vector includes:

definition of the definitionFor words->Number of features involved->The number of features contained for all words, +.>For wordsThe number of occurrences in image A, +.>For words->Number of times co-occurring in all images, then

Words and phrasesWeight in image A +.>The method comprises the following steps: />，

By word bag, using single vectorsDescribing one image a:

，

wherein ,is the word that image A has in the dictionary, < >>Is thatWeight corresponding to->Is a vector describing image a.

In order to achieve the above object, a second aspect of the present application provides a space-to-ground multi-mode multi-agent cooperative positioning and mapping device, comprising the following modules:

the acquisition module is used for acquiring measurement data of an intelligent agent, wherein the intelligent agent comprises an unmanned aerial vehicle and an unmanned vehicle, and the unmanned vehicle is provided with a visual marker;

the map building module is used for carrying out local view angle local map building through the measurement data of the unmanned aerial vehicle and carrying out global view angle local map building through the measurement data of the unmanned aerial vehicle;

the space-ground visual angle fusion module is used for detecting the visual marker through the unmanned aerial vehicle, acquiring the relative pose between the unmanned aerial vehicle and the unmanned aerial vehicle when the detection is successful, and carrying out space-ground visual angle map fusion and optimization by utilizing the conversion relation of the relative pose;

the similar visual angle fusion module is used for continuously detecting whether the intelligent body passes through an overlapping area based on loop, and when the overlapping area is detected, the association of two maps of the intelligent body is established through matched key frames so as to perform track calibration and similar visual angle map fusion and optimization;

and the output module is used for obtaining a pose track and a map which is globally consistent according to the map obtained through the fusion and optimization of the air-ground view angles and the map obtained through the fusion and optimization of the similar view angles.

To achieve the above object, an embodiment of the present application provides a computer device, which is characterized by comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein when the processor executes the computer program, an air-to-ground multi-mode multi-agent collaborative positioning and mapping method is implemented as described above.

To achieve the above object, a fourth aspect of the present application provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a space-to-ground multi-modal multi-agent cooperative localization and mapping method as described above.

The air-ground multi-mode multi-agent cooperative positioning and mapping method provided by the embodiment of the application is suitable for various application scenes and can be used for remarkably improving the perception capability and the working efficiency of a single robot system. Particularly in the field of searching and rescuing, an aerial unmanned aerial vehicle (UnmannedAerial Vehicle, UAV) detects unknown environmental topography by virtue of the aerial viewing angle of the aerial unmanned aerial vehicle, and guides a ground robot (Unmanned ground vehicle, UGV) to drive into a target area to implement accurate rescuing task, so that searching and rescuing are realized in a manner which is efficient and low in cost compared with manpower. The aerial unmanned aerial vehicle has a ground global visual field, and can construct a globally consistent environment map model by combining the local environment perception capability of the ground robot, and the map model can provide global navigation necessary information for the ground robot.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of a method for collaborative positioning and mapping of air-ground multi-mode multi-agents according to an embodiment of the present application.

FIG. 2 is a schematic diagram of a system for collaborative positioning and mapping of air-to-ground multi-modal multi-agents according to an embodiment of the present application.

FIG. 3 is a schematic diagram of a device for collaborative positioning and mapping of air-to-ground multi-mode multi-agents according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

The air-ground multi-mode multi-agent cooperative positioning and mapping method of the embodiment of the application is described below with reference to the accompanying drawings.

As shown in FIG. 1, the air-ground multi-mode multi-agent cooperative positioning and mapping method comprises the following steps:

s101: acquiring measurement data of an intelligent agent, wherein the intelligent agent comprises an unmanned aerial vehicle and an unmanned aerial vehicle, and the unmanned aerial vehicle is provided with a visual marker;

further, in one embodiment of the present application, obtaining measurement data of an agent includes:

preprocessing measurement data, including visual detection, optical flow tracking and inertial measurement unit IMU pre-integration; wherein,

visual detection and optical flow tracking include: the velocity vector of the optical flow is obtained by using the least square method, and the velocity vector of the optical flow is obtained by using the following formula,

，

the inertial measurement unit IMU pre-integration includes:

，

Specifically, FAST features are selected for feature extraction and optical flow tracking. In each new image, the existing feature points are tracked and new feature points are detected by the KLT algorithm. In order to ensure the uniform distribution of the feature points, the image is divided into a plurality of subareas with identical sizes, and each subarea extracts at most 10 FAST corner points, so that the number of corner points of each image is kept within a certain range. The displacement between two adjacent frames of images of the outdoor scene is larger, the brightness value of each pixel may be suddenly changed, and adverse effect is caused on the tracking of the characteristic points, so that the characteristic points need to be projected onto the unit sphere after outlier rejection. Outlier rejection uses the RANSAC algorithm to achieve more robust optical flow tracking in outdoor dynamic scenarios.

At the same time of visual detection and tracking, IMU pre-integration is performed. The IMU has fast response, is not affected by imaging quality, can estimate the characteristic of absolute scale, and supplements the visual positioning of the object with unstructured outdoor surface. If all the poses corresponding to all the sampling moments of the IMU are inserted between frames for optimization during camera pose estimation, the program operation efficiency is reduced, so that IMU pre-integration processing is needed, the acceleration and angular velocity measured values output at high frequency are converted into single observed values, and the measured values are linearized again in nonlinear iteration to form constraint factors of inter-frame state quantity.

S102: carrying out local view angle local map building through measurement data of the unmanned aerial vehicle, and carrying out global view angle local map building through measurement data of the unmanned aerial vehicle;

the initialization module is required to restore the scale of the monocular camera by loosely coupling the visual information and the IMU information. Firstly, solving the three-dimensional positions of the pose of all frames and all road mark points in a sliding window by using SFM, and then aligning the pose of all frames and the three-dimensional positions with the IMU pre-integral value obtained before, so as to solve the angular velocity bias, the gravity direction, the scale factor and the velocity corresponding to each frame. As systems run, the number of state variables increases, and sliding window methods are used to optimize the state variables within the window.

Further, in an embodiment of the present application, local view local mapping is performed by measurement data of an unmanned aerial vehicle, global view local mapping is performed by measurement data of an unmanned aerial vehicle, including:

，

S103: detecting a visual marker through the unmanned aerial vehicle, acquiring the relative pose between the unmanned aerial vehicle and the unmanned aerial vehicle when the detection is successful, and carrying out space-ground view map fusion and optimization by utilizing the conversion relation of the relative pose;

a visual marker is arranged above the ground robot, the air robot can detect the marker once observing the marker, and once the detection is successful, the relative pose between the intelligent bodies at the air end and a corresponding group of key frames are sent to the rear end.

When the visual identification carried by the ground robot is detected at the air end, the conversion relation of the visual identification under the coordinate system of the airborne camera is obtained through the visual identificationLet the conversion relation from visual identification to ground robot camera be +>Obtaining the conversion relation between the unmanned aerial vehicle camera coordinate system and the unmanned aerial vehicle camera coordinate system at the current moment as +.>And the pose conversion relation between the space key frames at the current time is obtained. At the same time, the key frame generated by the space terminal at the current moment is known +.> and />They are +_with respective reference key frames-> and />The pose conversion relation between the maps can be obtained to obtain a pose conversion matrix +.>。

S104: continuously detecting whether the intelligent agent passes through the overlapping area based on the loop, and when the overlapping area is detected, establishing association of two maps of the intelligent agent through the matched key frames to perform track calibration and similar view map fusion and optimization;

the loop detection method is based on a Bag of Words model (Bag-of-Words), and the DBoW2 library is applied to index the image database in a positive reverse order. The bag-of-words model is to calculate the similarity between the counted bag-of-words vector and the current frame to judge whether to generate loop.

First, all visual features are clustered, and one type of feature, i.e., a set of locally adjacent feature points, is a "word", so that all features are a "dictionary". If one wants to share one of many imagesThe feature points are classified as->Class, make "dictionary" into->And (5) a tree, storing and inquiring. By determining which "words" in a "dictionary" are in an image, an image can be described with a vector.

Further, in one embodiment of the present application, detecting whether an agent passes through an overlap region based on a loop comprises:

describing an image with a single vector:

calculating similarity between two images A and B：

，

if the similarity exceeds the threshold, then loop-back is considered to occur.

Further, in one embodiment of the application, clustering all visual features includes:

for each node of the first layer, the samples belonging to the node are re-aggregated intoClass, obtaining the next layer; and so on, finally obtaining a leaf layer, wherein the leaf layer is a word.

Further, in one embodiment of the present application, describing an image with a single vector includes:

By word bag, using single vectorsDescribing one image a:

，

S105: and obtaining the pose track and the map which are globally consistent according to the map obtained by the fusion and optimization of the air-ground view angle and the map obtained by the fusion and optimization of the similar view angles.

The map fusion and optimization module is divided into a similar view angle and an open-land view angle, the open-land view angle is executed only when the rear end receives information of successful detection of the visual sign, and the map fusion and optimization module can directly convert and match an open-land end map by utilizing a relative pose relationship to perform conversion, fusion and optimization;

the similar visual angle is always performed, and the newly-put key frame continuously detects whether the child terminal passes through the overlapping area based on the loop, wherein the overlapping area comprises the loop area of the child terminal and the overlapping area between the child terminals. Once the overlapping of the areas passed by the two sub-ends is detected, the association of the two maps is established through the matched key frames, the coordinate conversion relation is obtained, the map fusion is further carried out, the two maps are combined after the map fusion, the maps are deleted from the map stack, and a new global map generated by the combination of the maps is added into the map stack and directly associated with the two sub-ends.

After the back end is optimized, the optimized pose is sent to the corresponding sub-end, and the sub-end updates the pose and uses the pose as a constraint in the pose graph to perform local optimization, so that a local map of the sub-end is optimized to more accurately perform a subsequent positioning mapping process.

And finally outputting the pose track and the map which is globally consistent.

Fig. 2 is a schematic diagram of a space-to-ground multi-mode multi-agent collaborative positioning and mapping system according to an embodiment of the present application, which is divided into two parts: a child end and a back end. The unmanned aerial vehicle and the unmanned aerial vehicle are both called as intelligent agents, each intelligent agent independently operates a sub-end, the sub-end carries a camera, an IMU, a cradle head and a control system thereof, a communication unit and an onboard processing unit which can exchange data with a central server, an independent front-end visual inertial odometer operates, and key frames and map points are sent to the rear end and a local map with smaller scale is maintained; the rear end receives information of each sub-end, performs operations with large calculation amount such as visual marker detection, loop detection, fusion optimization and the like through the central server, outputs pose tracks, and creates a map with global consistency. The system uses ROS to communicate between the sub-end and the back-end, the sub-end transmits the captured key frames and map points to the back-end, and the back-end transmits the updated pose to the sub-end.

The space-earth multi-mode multi-agent cooperative positioning and mapping method provided by the application is suitable for various application scenes, and obviously improves the perception capability and the working efficiency of a single robot system. Particularly in the field of searching and rescuing, an aerial unmanned aerial vehicle (UnmannedAerial Vehicle, UAV) detects unknown environmental topography by virtue of the aerial viewing angle of the aerial unmanned aerial vehicle, and guides a ground robot (Unmanned ground vehicle, UGV) to drive into a target area to implement accurate rescuing task, so that searching and rescuing are realized in a manner which is efficient and low in cost compared with manpower. The aerial unmanned aerial vehicle has a ground global visual field, and can construct a globally consistent environment map model by combining the local environment perception capability of the ground robot, and the map model can provide global navigation necessary information for the ground robot.

Compared with the prior art, the application has the advantages that:

1) Pure vision SLAM has some drawbacks, vision is better for the capture of texture features in a scene, but is not sufficient for the capture of structural features in the environment, and is sensitive to initialization, illumination, and monocular cameras cannot get absolute dimensions of pose and map. According to the application, based on visual detection and tracking, the IMU sensor is added to perform multi-mode information fusion, the IMU response is fast, the IMU is not influenced by imaging quality, the characteristic of absolute scale can be estimated, and the visual positioning of an object with no structure on the outdoor surface is supplemented.

2) Vibration generated in the movement process of the intelligent body can be transmitted to the camera along with the fixed structure, so that the acquisition of characteristic points by the front-end system can be influenced.

In order to realize the embodiment, the application also provides an air-ground multi-mode multi-agent cooperative positioning and mapping device.

As shown in fig. 3, the air-ground multi-mode multi-agent cooperative positioning and mapping device comprises: the system comprises an acquisition module 100, a mapping module 200, an air-ground view angle fusion module 300, a similar view angle fusion module 400, an output module 500, wherein,

the image building module is used for carrying out local view angle local image building through the measurement data of the unmanned aerial vehicle and carrying out global view angle local image building through the measurement data of the unmanned aerial vehicle;

the space-ground visual angle fusion module is used for detecting the visual markers through the unmanned aerial vehicle, acquiring the relative pose between the unmanned aerial vehicle and the unmanned aerial vehicle when the detection is successful, and carrying out space-ground visual angle map fusion and optimization by utilizing the conversion relation of the relative pose;

the similar visual angle fusion module is used for continuously detecting whether the intelligent agent passes through the overlapping area based on the loop, and when the overlapping area is detected, the association of two maps of the intelligent agent is established through the matched key frames, and track calibration, similar visual angle map fusion and optimization are carried out;

and the output module is used for obtaining the pose track and the map which is globally consistent according to the map obtained by the fusion and optimization of the air-ground view angle and the map obtained by the fusion and optimization of the similar view angle.

To achieve the above object, an embodiment of the present application provides a computer device, which is characterized by comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the air-to-ground multi-mode multi-agent cooperative positioning and mapping method as described above when executing the computer program.

To achieve the above object, a fourth aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the air-to-ground multi-modal multi-agent collaborative positioning and mapping method as described above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The air-ground multi-mode multi-agent cooperative positioning and mapping method is characterized by comprising the following steps of:

2. The method of claim 1, wherein the obtaining measurement data of the agent comprises:

，

the inertial measurement unit IMU pre-integration includes:

，

will be the firstFrame to->Integrating all IMU data between frames to obtain the +.>Frame position->Speed->And rotation->，/>As a means ofThe initial value of the visual estimate is rotated into a quaternion form.

3. The method of claim 1, wherein the locally mapping from the perspective of the unmanned vehicle from the measurement data comprises:

，

wherein , and />For the rotation and translation part of the camera pose, < +.>For the speed of the camera in world coordinate system, +.>Andacceleration bias and angular velocity bias of the IMU, respectively; state quantity optimization of systemThe function of the target is then normalized, as follows,

，

4. The method of claim 1, wherein the loop-based continuous detection of whether the agent passes through an overlap region comprises:

describing an image with a single vector:

calculating similarity between two images A and B：

，

wherein ,is a vector describing image a, +.>Is a descriptive diagramVector like B>Is the similarity between the two images A and B, < >>Vector representing descriptive image A>Is>Component(s)>Vector representing descriptive picture B>Is the first of (2)A component;

5. The method of claim 4, wherein the clustering all visual features comprises:

6. The method of claim 4, wherein said describing an image with a single vector comprises:

definition of the definitionFor words->Number of features involved->The number of features contained for all words, +.>For words->The number of occurrences in image A, +.>For words->Number of times co-occurring in all images, then

By word bag, using single vectorsDescribing one image a:

，

7. The air-ground multi-mode multi-agent cooperative positioning and mapping device is characterized by comprising the following modules:

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the air-to-ground multi-modal multi-agent co-localization and mapping method of any one of claims 1-6 when the computer program is executed by the processor.

9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the air-to-ground multi-modal multi-agent co-localization and mapping method of any one of claims 1-6.