CN117635721A - Target positioning method, related system and storage medium - Google Patents

Target positioning method, related system and storage medium Download PDF

Info

Publication number
CN117635721A
CN117635721A CN202210980993.8A CN202210980993A CN117635721A CN 117635721 A CN117635721 A CN 117635721A CN 202210980993 A CN202210980993 A CN 202210980993A CN 117635721 A CN117635721 A CN 117635721A
Authority
CN
China
Prior art keywords
marker
pose
target
global
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210980993.8A
Other languages
Chinese (zh)
Inventor
龙云飞
彭成涛
朱森华
涂丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202210980993.8A priority Critical patent/CN117635721A/en
Priority to PCT/CN2023/086234 priority patent/WO2024036984A1/en
Publication of CN117635721A publication Critical patent/CN117635721A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The embodiment of the application provides a target positioning method, a related system and a storage medium. The method comprises the following steps: coarsely positioning a target in a preset area to obtain the coarse pose of the target; acquiring an image of a first marker in the preset area, and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker; obtaining the relative pose between the first marker and the target according to the image of the first marker and the pose of the target under the camera coordinate system; and obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target. By adopting the method, the global pose estimation of the target with ultra-high precision can be obtained.

Description

Target positioning method, related system and storage medium
Technical Field
The present disclosure relates to the field of positioning technologies, and in particular, to a target positioning method, a related system, and a storage medium.
Background
With the continuous development of technology, automatic driving, mobile robots and the like have a large market scale. Positioning is a very important basic task in mobile robots or autopilot. It is a very challenging matter to achieve a highly accurate, robust, cost-effective positioning. The expensive single sensor can better solve the positioning problem, such as a higher level inertial measurement unit (Inertial Measurement Unit, IMU). But considering that most scenarios are cost prohibitive, solutions that rely solely on expensive high-precision sensors are less applicable. So low cost multi-sensor fusion is a more viable solution to cover more scenarios. The laser Radar Lidar, the vision Camera, the inertial measurement unit IMU, the wheel type odometer, the Radar, the global positioning system (Global Positioning System, GPS) and other sensors have respective advantages and disadvantages, and also have various arrangements and combinations. The common fusion algorithm of lidar+IMU is difficult to solve the problems of loop detection and repositioning; the common fusion algorithm of camera+IMU is difficult to solve the problem of accurate depth estimation; the common lidar+camera+IMU algorithm does not achieve real multi-sensor tight coupling; in order to cope with rain, snow and fog weather, the cost of specially adding the Radar sensor on the basis of a conventional sensor such as Lidar+Camera+IMU is high. Certain scenes may cause certain sensors to fail (such as a transformer substation scene has strong electromagnetic interference, so that positioning errors of a GPS or Real-time dynamic carrier-phase differential (RTK) technology become extremely large, such as road bumps may cause skidding and abrasion of a wheel type odometer, so that accumulated errors of long-time inspection of a robot are larger, echo reflection of an open scene is less, effective feature points may not be detected by a laser radar, and various sensors such as Lidar, camera and the like can generate serious performance degradation under the scenes such as rain, fog, snow and the like, so that the robot cannot recognize accurate pose of the robot.
Disclosure of Invention
The application discloses a target positioning method, a related system and a storage medium, which can realize the positioning of a target with high precision, high robustness and high cost performance.
In a first aspect, an embodiment of the present application provides a target positioning method, including:
coarsely positioning a target in a preset area to obtain the coarse pose of the target;
acquiring an image of a first marker in the preset area, and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker;
obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
acquiring the pose of the target under the camera coordinate system, and acquiring the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
and obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
According to the method, the vehicle, the robot or the server and the like, the target (such as an unmanned vehicle, the robot and the like) is roughly positioned, then the global pose of the first marker is obtained based on the image of the first marker and the rough pose of the target, the relative pose between the first marker and the target is obtained according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system, and then the global pose of the target is obtained according to the global pose of the first marker and the relative pose between the first marker and the target. By adopting the means, the target is roughly positioned, and then the target is finely positioned by combining the image of the first marker and the rough pose, so that the global pose estimation of the target with ultrahigh precision can be facilitated.
For example, the above-described coarse positioning is performed based on a laser radar, and the fine positioning is performed based on a visual camera. According to the scheme, coarse positioning based on the laser radar is performed firstly, and then fine positioning based on the visual camera is performed, wherein the accuracy of laser radar positioning is 5-10cm, and the visual fine positioning can achieve the positioning accuracy of about 1-2 cm. The combination of laser radar coarse positioning and visual fine positioning can meet the demands of customers, and can realize the positioning of targets with high precision, high robustness and high cost performance.
In one possible implementation manner, the obtaining the global pose of the first marker according to the coarse pose of the target and the image of the first marker includes:
obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
And acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
The laser radar is used for coarse positioning, so that the minimum operation requirement that the marker is more than 1/10 in a picture can be met when the marker is required to be positioned visually and precisely; otherwise, if only the vision fine positioning module is used, the minimum starting requirement of the vision fine positioning cannot be met. According to the scheme, coarse positioning based on the laser radar is performed firstly, and then fine positioning based on the visual camera is performed, wherein the accuracy of laser radar positioning is 5-10cm, the high-accuracy positioning requirement of 1-2cm of a customer is difficult to meet, and the visual fine positioning can achieve the positioning accuracy of about 1-2 cm. The combination of laser radar coarse positioning and visual fine positioning can meet the demands of customers, and can realize the positioning of targets with high precision, high robustness and high cost performance.
On the other hand, the global pose of the marker is searched through the semantic locating local map, and the range is further reduced for searching due to the fact that the number of possible markers is large, so that the accuracy of global pose estimation of the marker is higher, and the efficiency of target locating is improved.
In one possible implementation, the method further includes:
Performing three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
registering the textured three-dimensional models of the M markers into the global point cloud map to obtain the semantic locating global map.
The semantic locating global map can be established in an offline mode, and then an online mode is adopted to locate the target. By adopting the mode of separating the off-line module from the on-line module, the calculation power consumption of the on-line module can be greatly reduced, so that the hardware cost of a vehicle or a robot is greatly reduced, and the endurance is greatly improved.
In a possible implementation manner, the obtaining the pose of the first marker in the camera coordinate system according to the image of the first marker includes:
inputting the image of the first marker into a preset model for processing to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of substitution, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on a background in initial training image data and/or performing one or more of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on the marker in the initial training image data.
The model is trained by amplifying training data, so that the obtained relative pose between the marker and the target is more accurate and the robustness is higher.
In a second aspect, embodiments of the present application provide a target positioning device, including:
the coarse positioning module is used for performing coarse positioning on the target in the preset area to obtain the coarse pose of the target;
the first processing module is used for acquiring an image of a first marker in the preset area and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker;
the second processing module is used for obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
the third processing module is used for acquiring the pose of the target under the camera coordinate system and obtaining the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
and the positioning module is used for obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
According to the method, the vehicle, the robot or the server and the like, the target (such as an unmanned vehicle, the robot and the like) is roughly positioned, then the global pose of the first marker is obtained based on the image of the first marker and the rough pose of the target, the relative pose between the first marker and the target is obtained according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system, and then the global pose of the target is obtained according to the global pose of the first marker and the relative pose between the first marker and the target. By adopting the means, the target is roughly positioned, and then the target is finely positioned by combining the image of the first marker and the rough pose, so that the global pose estimation of the target with ultrahigh precision can be facilitated.
For example, the above-described coarse positioning is performed based on a laser radar, and the fine positioning is performed based on a visual camera. According to the scheme, coarse positioning based on the laser radar is performed firstly, and then fine positioning based on the visual camera is performed, wherein the accuracy of laser radar positioning is 5-10cm, and the visual fine positioning can achieve the positioning accuracy of about 1-2 cm. The combination of laser radar coarse positioning and visual fine positioning can meet the demands of customers, and can realize the positioning of targets with high precision, high robustness and high cost performance.
In one possible implementation manner, the first processing module is configured to:
obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
and acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
In a possible implementation manner, the apparatus further includes a fourth processing module, configured to:
performing three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
registering the textured three-dimensional models of the M markers into the global point cloud map to obtain the semantic locating global map.
In one possible implementation manner, the second processing module is further configured to:
Inputting the image of the first marker into a preset model for processing to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of substitution, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on a background in initial training image data and/or performing one or more of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on the marker in the initial training image data.
In a third aspect, the present application provides a cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory; wherein the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform a method as provided by any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a method as provided by any one of the possible implementations of the first aspect.
In a fifth aspect, the present embodiments provide a computer program product, which when run on a computer causes the computer to perform the method as provided by any one of the possible implementations of the first aspect.
It will be appreciated that the apparatus of the second aspect, the cluster of computing devices of the third aspect, the computer storage medium of the fourth aspect, or the computer program product of the fifth aspect provided above are each adapted to perform the method provided in any of the first aspects. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.
Drawings
The drawings used in the embodiments of the present application are described below.
FIG. 1a is a schematic diagram of an architecture of an object positioning system according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a system architecture of a vehicle according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a target positioning method according to an embodiment of the present application;
FIG. 3 is a flowchart of another object positioning method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of target positioning according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a target positioning device according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a hardware architecture of a computing device according to an embodiment of the present application;
fig. 7 is a schematic hardware structure of a computing device cluster according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.
For ease of understanding, the following description of some of the concepts related to the embodiments of the present application are given by way of example for reference. The following is described:
1. deep learning (deep learning): the machine learning technology based on deep neural network algorithm features that multiple nonlinear transformation structures are used to process and analyze data. The method is mainly applied to scenes such as perception, decision and the like in the field of artificial intelligence, such as image and voice recognition, natural language translation, computer game and the like.
2. Laser vision fusion localization (lidar-vision fused location): in mobile robots or autopilots, it is a very important thing to know the position of the robot, which is the positioning. The accuracy of positioning involving a single lidar sensor or positioning involving a single vision camera is often inadequate, and a method of positioning incorporating a lidar and a vision camera (or possibly also a wheel odometer, an inertial measurement unit on this basis) is known as laser vision fusion positioning.
3. Object pose estimation (object pose estimation): the position and orientation of the target object (degrees of freedom (degree of freedom, doF) in 6 directions, the 6DoF pose including the 3-dimensional position and 3-dimensional spatial orientation) is estimated, which is the object pose estimation. In the general case of three-dimensional space, the positional translation is described by the variables on the three coordinate axes X, Y, Z, and the orientation is also described by the amount of rotation on the X, Y, Z three axes.
4. Inertial measurement unit (inertial measurement unit): the inertial measurement unit is a device that measures the three-axis attitude angle (or angular rate) and acceleration of an object. Generally, an IMU includes three single-axis accelerometers and three single-axis gyroscopes, where the accelerometers detect acceleration signals of the object in the carrier coordinate system on three independent axes, and the gyroscopes detect angular velocity signals of the carrier relative to the navigation coordinate system, measure angular velocity and acceleration of the object in three-dimensional space, and calculate the pose of the object based on the angular velocity and acceleration signals.
The above exemplary description of the concepts may be applied in the following embodiments.
Because the fusion positioning of multiple sensors in the prior art cannot achieve fusion positioning with high precision, high robustness and high cost performance, the application provides a target positioning method, a related system and a storage medium, and the positioning of the target with high precision, high robustness and high cost performance can be achieved.
The system architecture of the embodiments of the present application will be described in detail below with reference to the accompanying drawings. Referring to fig. 1a, fig. 1a is a schematic diagram of an object positioning system, which includes a vehicle 101 and a server 102, and is applicable to an embodiment of the present application.
The vehicle 101 in the embodiment of the present application is a device that moves by power drive. The vehicle 101 is a device having communication capability and computing capability capable of providing mobile travel services to a user. The vehicle 101 typically includes a variety of subsystems including, for example, but not limited to, a travel system, a sensor system, a control system, one or more peripheral devices, a power source or user interface, and the like. Alternatively, vehicle 101 may include more or fewer subsystems, and each subsystem may include multiple elements. In addition, each of the subsystems and elements of vehicle 101 may be interconnected by wires or wirelessly.
It should be noted that, in the embodiment of the present application, the vehicle 101 may be an automobile or an electric vehicle, may be a vehicle running on a track, and may also be an intelligent vehicle (e.g., an unmanned vehicle) or an intelligent mobile robot.
The intelligent vehicle supports the functions of sensing road environment through the vehicle-mounted sensing system, automatically planning a driving route and controlling the vehicle to reach a preset target position. The intelligent automobile uses the technologies of computer, sensing, information fusion, communication, artificial intelligent machine or automatic control and the like in a centralized way, and is a high-new technology complex integrating the functions of environment sensing, planning decision, multi-level auxiliary driving and the like. For example, the intelligent vehicle may be an automobile or a wheeled mobile robot or the like having an assisted driving system or a fully automatic driving system.
The server 102 is a device with centralized computing capabilities. The server 102 may be implemented by a server, a virtual machine, a cloud, a roadside device, or a robot, for example.
When the server 102 comprises a server, the type of server includes, but is not limited to, a general purpose computer, a special purpose server computer, a blade server, and the like. The number of servers included in the server 102 is not limited, and may be one or multiple (e.g., a server cluster).
Virtual machines refer to computing modules that run in a completely isolated environment with complete hardware system functionality through software emulation. Of course, in addition to virtual machines, the server 102 may be implemented by other computing instances, such as containers, etc.
The cloud is a software platform adopting an application program virtualization technology, and can enable one or more software and applications to be developed and run in independent virtualization environments. Alternatively, when the server 102 is implemented by the cloud, the cloud may be deployed on a public cloud, a private cloud, or a hybrid cloud, or the like.
The road side device is a device provided on a road side (or an intersection, or a road side, etc.), and the road may be an outdoor road (for example, a main road, an auxiliary road, an overhead road, or a temporary road, etc.), or an indoor road (for example, a road in an indoor parking lot, etc.). The road side device is capable of providing services to the vehicle. It should be noted that the road side device may be a stand-alone device or may be integrated into other devices. For example, the roadside device may be integrated in a smart gas station, a charging post, a smart signal lamp, a street lamp, a utility pole, or a traffic sign, among other devices.
Fig. 1b is a schematic diagram of a system architecture of an exemplary vehicle 101, the vehicle 101 including a plurality of vehicle integrated units (vehicle integration unit, VIU), a communication box (TBOX), a cabin controller (cockpit domain controller, CDC), a mobile data center (mobile data center, MDC), and an overall vehicle controller (vehicle domain controller, VDC).
The vehicle 101 also includes various types of sensors disposed on the body, including: laser radar, millimeter wave, ultrasonic radar, camera device. Each type of sensor may include a plurality. It should be understood that the sensor number and position layout in fig. 1b is an illustration, and one skilled in the art can reasonably select the type, number and position layout of the sensors as desired. In fig. 1b, four VIUs are shown. It should be understood that the number and location of the VIUs in fig. 1b is merely one example. Those skilled in the art can select the appropriate number and location of VIUs according to the actual needs.
The vehicle integrated-unit VIU provides a plurality of vehicle components with some or all of the data processing functions or control functions required for the vehicle components. The VIU may have one or more of the following functions.
1. The electronic control function, VIU, is used to implement the electronic control function provided by an electronic control unit (electronic control unit, ECU) within some or all of the vehicle components. Such as control functions required for a certain vehicle component, and such as data processing functions required for a certain vehicle component.
2. The same function as the gateway, i.e., the VIU may also have some or all of the same functions as the gateway, such as a protocol conversion function, a protocol encapsulation and forwarding function, and a data format conversion function.
3. And a data processing function for processing and calculating data acquired from the actuators of the plurality of vehicle palm members.
The data related to the above-mentioned functions may include the operation data of the actuator in the vehicle rate component, for example, the motion parameter of the actuator, the working state of the actuator, etc., and the data related to the above-mentioned functions may also be the data collected by the data collecting unit (for example, the sensor) of the vehicle component, for example, the road information of the road on which the vehicle is traveling, or the weather information, etc., collected by the sensor of the vehicle, and the comparison of this embodiment is not limited specifically.
According to the method, the vehicle, the robot or the server and the like, the target (such as an unmanned vehicle, the robot and the like) is roughly positioned, then the global pose of the first marker is obtained based on the image of the first marker and the rough pose of the target, the relative pose between the first marker and the target is obtained according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system, and then the global pose of the target is obtained according to the global pose of the first marker and the relative pose between the first marker and the target. By adopting the means, the target is roughly positioned, and then the target is finely positioned by combining the image of the first marker and the rough pose, so that the global pose estimation of the target with ultrahigh precision can be facilitated.
For example, coarse positioning is based on laser radar and fine positioning is based on a vision camera. Coarse positioning based on the laser radar is performed firstly, and then fine positioning based on the visual camera is performed, wherein the precision of the positioning of the laser radar is 5-10cm, and the precision of the positioning of the visual camera can be about 1-2 cm. The combination of laser radar coarse positioning and visual fine positioning can meet the demands of customers, and can realize the positioning of targets with high precision, high robustness and high cost performance.
Having described the architecture of the embodiments of the present application, the following describes the methods of the embodiments of the present application in detail.
Referring to fig. 2, a flow chart of a target positioning method according to an embodiment of the present application is shown. Alternatively, the method may be applied to the aforementioned object positioning system, such as the object positioning system shown in fig. 1 a. The target positioning method as shown in fig. 2 may comprise steps 201-205. It should be understood that the description of the sequence 201-205 is for convenience of description and is not intended to limit the execution of the sequence. The embodiment of the present application is not limited to the execution sequence, execution time, execution times, and the like of the one or more steps. The execution body of the embodiment of the present application may be a vehicle, for example, executed by a vehicle-mounted device (such as a vehicle machine), or may also be a terminal device such as a mobile phone, a computer, or the like. It should be noted that, the target positioning method provided in the application may be executed locally, or may be executed by uploading the image of the target and the image of the marker to the cloud. The cloud may be implemented by a server, which may be a virtual server, an entity server, or other devices, which is not specifically limited in this scheme. The following description will take the execution subject of steps 201-205 of the object localization method as an example of the execution of a vehicle (e.g., an unmanned vehicle), and the present application is equally applicable to other execution subjects. The steps 201-205 are specifically as follows:
201. Coarsely positioning a target in a preset area to obtain the coarse pose of the target;
the preset area may be, for example, all areas of a substation, or a park, a forest, a family, a road, etc., and the present solution does not limit the area strictly.
The target may be, for example, a vehicle, a robot, or the like, or may be another autonomously movable or non-autonomously movable object, or the like, to which the present solution is not limited strictly.
The rough localization can be understood as an approximate estimation of the pose of the target.
The Pose is understood as Position and posture (Position & else), where Position is a translation in three degrees of freedom and posture is a rotation in three degrees of freedom. Pose estimation is to estimate the position and orientation of the target object, namely 6 DoF. In the general case of three-dimensional space, the positional translation is described by the variables on the three coordinate axes X, Y, Z, and the orientation is also described by the amount of rotation on the X, Y, Z three axes.
In one possible implementation, the target is coarsely positioned based on HDL-localization algorithm based on sensors such as Lidar and inertial measurement unit IMU in the vehicle to obtain coarse pose of the target. Of course, other algorithms are also possible, such as fast-lio-localization, monte Carlo positioning, etc., and the present solution is not limited strictly.
202. Acquiring an image of a first marker in the preset area, and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker;
the marker can be any object in an application scene corresponding to the preset area, including but not limited to an electric box in a transformer substation, a telegraph pole in a park, a fir tree in a forest, a tea table in a home, a road side device on a road and the like, and the scheme is not strictly limited.
Alternatively, the marker may be an asymmetric object and have a texture (e.g., may be a complex texture). For example, when an object such as a vehicle reaches a rough positioning range, the mark occupies a picture in a ratio of 1/10 to 1/2, etc.
In a possible implementation manner, the step of obtaining the global pose of the first marker according to the coarse pose of the target and the image of the first marker includes steps 2021-2023, specifically as follows:
2021. obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
the global point cloud map of the preset area may be established by acquiring point cloud data of the preset area according to the point cloud data.
Optionally, the global point cloud map is established by using a synchronized localization and mapping (simultaneous localization and mapping, SLAM) method. For example, by running a SLAM program, point cloud data obtained by laser radar Lidar scanning is processed, and a global point cloud map is established based on an LIO-SAM mapping algorithm.
Of course, other algorithms may be used, such as, but not limited to, LOAM, lego-LOAM, FAST-LIO, and the like.
The rough pose is used for knowing the approximate position of the vehicle or the robot, for example, the radius of 10cm of the approximate position threshold of the selected vehicle is overlapped with the global point cloud map, and the local map can be obtained.
2022. Obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
the semantic locating global map may be obtained by distinguishing semantic elements corresponding to the markers from semantic elements corresponding to the non-markers. The global pose of the marker is stored in a semantic locating global map and can be inquired by a subsequent algorithm module to obtain the global pose of the marker.
It should be noted that, the semantic locating global map is a map of all elements containing the same semantic in the preset area. For example, when the markers are tables, the semantically located global map may be understood as a semantically located global map of all tables.
Correspondingly, the semantic locating local map is a part of map in the semantic locating global map. The N markers in the semantic locating local map are markers in M markers in the semantic locating global map.
For example, the local map and the semantic locating global map are overlapped to obtain a semantic locating local map. For example, the map range of the local map is overlapped with the semantic locating global map by + -10 cm, so that the marker and the target can be placed in one local map, and the target locating is more accurate and efficient when the target is located later through processing in the local map.
In one possible implementation, the semantically localized global map may be obtained by the following steps A1-A2:
a1, carrying out three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
It will be appreciated that the image of the first marker may include only the first marker, or may include a plurality of markers having the same meaning as the first marker. That is, M may be 1 or an integer greater than 1.
A textured three-dimensional model of M markers including the first marker may be obtained based on the image.
Alternatively, the three-dimensional reconstructed marker model with the real texture is obtained by acquiring images obtained by photographing the marker at various angles, such as a depth camera, and then performing three-dimensional reconstruction by using a BundleFusion algorithm. Of course, three-dimensional reconstruction may be performed in other manners, such as Kintinuous, elasticFusion, etc., which is not strictly limited by the present scheme.
A2, registering the three-dimensional models with textures of the M markers into the global point cloud map to obtain the semantic locating global map.
Based on the obtained three-dimensional model of the marker and the global point cloud map, registering the three-dimensional model of the marker into the global point cloud map by a point cloud registration method. Alternatively, the method of point cloud registration may be, for example, using the teaser++ method.
Of course, other algorithms may be used, such as iterative closest point algorithm (Iterative Closest Point, ICP), etc., and the present solution is not limited in this regard.
Wherein, through point cloud registration, the global pose of each marker in the semantically located global map is known. Alternatively, the global pose of each marker can be obtained by querying a database corresponding to the semantic localization global map.
Illustrating: the marker may be an electrical box, for example, a first column in a semantically located global map corresponds to an electrical box in an actual substation scenario, the global pose of which includes: the 3-dimensional position is x=24, y=6, z=0; the 3-dimensional spatial orientation is pitch=0°, heading angle yaw=90°, roll angle roll=0°.
It should be noted that, the manner of obtaining the semantic locating global map of the marker may be obtained in advance and may be reused. The above examples are presented by way of example only and may be obtained by other means, and the present solution is not strictly limited thereto.
2023. And acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
Because a plurality of markers may exist in the semantic locating global map, in order to accurately acquire the first marker, a part of the markers are screened out by acquiring the semantic locating local map, then the first marker is determined from the semantic locating local map according to the image of the first marker, and then the global pose of the first marker is acquired.
For example, 3 markers are arranged in a picture shot by a visual camera, and the marker with the largest mask area is selected to be determined as a first marker, so that the other two markers with smaller mask areas are eliminated, unique markers are left, a local map is positioned based on semantics, and the global pose of the first marker is obtained.
Of course, other methods may be used to determine the unique marker, and the present scheme is not limited in this regard.
203. Obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
optionally, shooting is performed based on a camera carried on the vehicle or the robot so as to obtain an image of the first marker. Based on the image, the pose of the marker in the camera coordinate system can be obtained.
In one possible implementation, the pose of the marker in the camera coordinate system may be obtained by processing the marker based on a deep learning method. For example, the image of the first marker is input into a preset model for processing, so as to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of processing of replacing, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplifying and shrinking on a background in initial training image data, and/or performing one or more of processing of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplifying and shrinking on the marker in the initial training image data.
The preset model may be any image processing model, and the present scheme is not limited thereto. For example, training the initial model based on a plurality of training image data, and continuously and iteratively adjusting parameters of the initial model based on a preset stopping condition until the stopping condition is reached, thereby obtaining a trained preset model. Alternatively, the stopping condition may be that the number of training times reaches 100 times, or the loss value is smaller than a preset threshold value, or the like.
In another possible implementation, the pose of the marker of 6DoF with respect to the camera is obtained by inputting a two-dimensional image of the marker, for example taken by the camera, into an object positioning algorithm for processing.
Then, the pose of the marker relative to the camera is converted into the pose of the marker relative to the vehicle or robot based on the calibrated coordinate transformation determined by the self structure of the target such as the vehicle or robot.
The object positioning algorithm may be, for example, a PVnet algorithm. Of course, other algorithms are possible, such as DenseFouse, and the present solution is not limited in this regard.
204. Acquiring the pose of the target under the camera coordinate system, and acquiring the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
The pose of the target in the camera coordinate system may be preset. Optionally, the camera is arranged on the vehicle or the robot in advance, so that the camera has a relative position with respect to the center of the vehicle or the robot, namely, the pose of the target under the camera coordinate system.
Based on the pose of the first marker under the camera coordinate system and the pose of the target under the camera coordinate system, the relative pose between the first marker and the target can be obtained through coordinate transformation.
205. And obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
And combining the pose of the marker relative to the vehicle or the robot to obtain the global pose of the vehicle or the robot, and outputting the global pose as the final refined pose 6 DoF.
For example, the global pose of the marker is transferred to the pose of the marker relative to the vehicle or robot by conventional coordinate conversion, resulting in the pose of the vehicle or robot. Of course, other methods may be used to obtain the global pose of the target, which is not strictly limited in this scheme.
According to the method, the vehicle, the robot or the server and the like, the target (such as an unmanned vehicle, the robot and the like) is roughly positioned, then the global pose of the first marker is obtained based on the image of the first marker and the rough pose of the target, the relative pose between the first marker and the target is obtained according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system, and then the global pose of the target is obtained according to the global pose of the first marker and the relative pose between the first marker and the target. By adopting the means, the target is roughly positioned, and then the target is finely positioned by combining the image of the first marker and the rough pose, so that the global pose estimation of the target with ultrahigh precision can be facilitated.
For example, the above-described coarse positioning is performed based on a laser radar, and the fine positioning is performed based on a visual camera. According to the scheme, coarse positioning based on the laser radar is performed firstly, and then fine positioning based on the visual camera is performed, wherein the accuracy of laser radar positioning is 5-10cm, and the visual fine positioning can achieve the positioning accuracy of about 1-2 cm. The combination of laser radar coarse positioning and visual fine positioning can meet the demands of customers, and can realize the positioning of targets with high precision, high robustness and high cost performance.
Referring to fig. 3, a flowchart of another object positioning method according to an embodiment of the present application is shown. Alternatively, the method may be applied to the aforementioned object positioning system, such as the object positioning system shown in fig. 1 a. The target positioning method as shown in fig. 3 may comprise steps 301-308. It should be understood that the description herein is presented in the order 301-308 for ease of description and is not intended to limit the execution to necessarily be performed in the order described above. The embodiment of the present application is not limited to the execution sequence, execution time, execution times, and the like of the one or more steps. The following description will take the execution subject of steps 301-308 of the object localization method as an example of a vehicle, and the application is equally applicable to other execution subjects, such as robots or servers. The steps 301-308 are specifically as follows:
301. Acquiring an image of a first marker in a preset area, and carrying out three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
alternatively, the image of the marker may be taken by a camera.
The description of step 301 is referred to in the foregoing embodiments, and will not be repeated here.
302. Acquiring point cloud data of the preset area, and establishing a global point cloud map according to the point cloud data;
the description of step 302 is referred to in the foregoing embodiments, and will not be repeated here.
303. Registering the three-dimensional models with textures of the M markers into the global point cloud map to obtain the semantic locating global map;
the description of step 303 is referred to in the foregoing embodiments, and is not repeated here.
Fig. 4 is a schematic diagram of target positioning according to an embodiment of the present application. The figure shows that the semantic localization map of markers may be performed in an offline mode. I.e. steps 301-303 may be performed in an offline mode. Wherein the offline mode is asynchronously operated, as compared to the online mode, which is operated in real time. The offline mode will typically run once before the online mode.
It should be noted that, the steps 301-303 may be performed once, and then may be directly taken for reuse.
For example, after the semantic location global map is obtained, the semantic location global map can be applied to other various target locations of the preset area, such as other vehicle locations, robot locations, pedestrian locations, and the like, and the scheme is not limited strictly.
According to the scheme, the object with the object in the specific application scene is selected as the positioning marker, compared with the two-dimensional code manually arranged in the prior art, the scheme does not damage the specific application scene, reduces the labor cost and can be widely applied to the field of positioning. By establishing an offline semantic positioning global map, the pose of the ultra-high-precision marker relative to the global can be obtained, and the precision of the precise pose of the final target can be improved.
Fig. 4 shows steps 304-307 being performed in online mode. The method comprises the following steps:
304. coarse positioning is carried out on the target in the preset area, and a coarse pose of the target is obtained;
in one possible implementation, the target is coarsely positioned by the lidar of the vehicle. For the description of this step, reference may be made to the descriptions in the foregoing embodiments, and the description is omitted here.
Fig. 4 shows the way in which the target is coarsely positioned based on the Lidar and the inertial measurement unit IMU. The method comprises the steps of performing downsampling on data obtained through laser radar scanning, performing point cloud de-distortion processing, performing point cloud registration on the data obtained through inertial measurement unit IMU and the data obtained through the point cloud de-distortion processing, and performing global optimization on the basis of a global map to obtain the rough pose of a target.
305. Obtaining the global pose of the first marker according to the rough pose of the target and the image of the first marker;
alternatively, as shown in fig. 4, when the number of markers in the visual camera (camera) is greater than 1, the localization of the target may be performed by determining one marker and then based on the marker.
For example, if 3 markers are in the picture of the visual camera, the marker with the largest mask area can be selected, the other two markers with smaller mask areas are eliminated, and the unique markers are left for target positioning.
Specifically, the unique marker is subjected to marker positioning based on a semantic positioning local map, so that the global pose of the marker is obtained. For the description of this section, reference may be made to the description of the foregoing embodiments, and no further description is given here.
306. Obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
in one possible implementation manner, the image of the first marker is input into a preset model for processing, so as to obtain the pose of the first marker under a camera coordinate system, wherein training data of the model is obtained by performing one or more of processing of replacing, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplifying and shrinking on a background in initial training image data, and performing one or more of processing of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplifying and shrinking on the marker in the initial training image data.
For example, the angle and background of the observed markers are transformed to perform data augmentation on the training data. Specifically, background subtraction is performed on two-dimensional pictures of markers taken at intervals of 10 ° left and right and 10 ° up and down, and replaced with other backgrounds in specific application scenes. Both the background and the marker can be subjected to data amplification operations such as Gaussian blur (Gaussian kernel size 1-5), translation (random vertical and horizontal translation range 1-30 pixels), clipping (random vertical and horizontal clipping range 1-30 pixels), contrast conversion (contrast random adjustment range + -20%), gamma conversion (Gamma parameter 0.01-0.2), amplification (random amplification ratio 1-10%), reduction (random reduction ratio 1-10%), and the like. The model is trained based on the deep learning algorithm, so that the stability of the algorithm is higher, the precision is higher, and the robustness is higher.
307. Acquiring the pose of the target under the camera coordinate system, and acquiring the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
for the description of this section, reference may be made to the description of the foregoing embodiments, and no further description is given here.
308. And obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
And transmitting the global pose of the first marker to the pose of the first marker relative to the vehicle or the robot through conventional coordinate conversion, so as to obtain the pose of the vehicle or the robot. Of course, other methods may be used to obtain the global pose of the target, which is not strictly limited in this scheme.
The embodiment provides a brand new high-precision positioning method for fusing the laser radar with the vision sensor. First, a semantically located global map is built in an off-line manner. Then, an online positioning mode is started (for example, a robot or an autonomous vehicle performs rough positioning based on a laser radar and then performs fine positioning based on a visual camera). This can help to obtain a global pose estimate of the target with ultra-high accuracy.
By adopting the mode of separating the off-line module from the on-line module, the calculation power consumption of the on-line module can be greatly reduced, so that the hardware cost of a vehicle or a robot is greatly reduced, and the endurance is greatly improved.
On the other hand, the laser radar is used for coarse positioning, so that the minimum operation requirement that the marker is more than 1/10 in a picture can be met when the marker is required to be positioned visually and precisely; otherwise, if only the vision fine positioning module is used, the minimum starting requirement of the vision fine positioning cannot be met.
On the one hand, the precision of laser radar positioning is 5-10cm, the high-precision positioning requirement of 1-2cm of a customer is difficult to meet, and the visual precision positioning can be performed to the positioning precision of about 1-2 cm. The combination of laser radar coarse positioning and vision fine positioning can meet the requirements of customers.
It should be noted that, this scheme can wide application in fields such as unmanned vehicles, robot location, for example can be electric power inspection scene, unmanned taxi, electric power inspection, garden inspection, oil gas inspection, geological exploration, commodity circulation transportation, home service, unmanned nucleic acid detection etc. a large amount of scenes. Of course, other fields or scenarios are possible, and the present solution is not limited thereto.
It should be noted that, in the various embodiments of the present application, if there is no specific description or logic conflict, terms and/or descriptions between the various embodiments have consistency and may refer to each other, and technical features in different embodiments may be combined to form a new embodiment according to their inherent logic relationship.
The foregoing details the method of embodiments of the present application, and the apparatus of embodiments of the present application is provided below. It should be understood that in the embodiments of the present application, the division of a plurality of units or modules is only a logic division according to functions, and is not limited to a specific structure of the apparatus. In a specific implementation, some of the functional modules may be subdivided into more tiny functional modules, and some of the functional modules may be combined into one functional module, where the general flow performed by the apparatus is the same whether the functional modules are subdivided or combined. For example, some devices include a receiving unit and a transmitting unit. In some designs, the transmitting unit and the receiving unit may also be integrated as a communication unit, which may implement the functions implemented by the receiving unit and the transmitting unit. Typically, each unit corresponds to a respective program code (or program instruction), and when the respective program code of the units runs on the processor, the unit is controlled by the processing unit to execute a corresponding flow, so as to realize a corresponding function.
Embodiments of the present application also provide an apparatus for implementing any of the above methods, for example, providing a target positioning apparatus including a module (or means) to implement steps performed by a vehicle in any of the above methods.
For example, referring to fig. 5, a schematic structural diagram of a target positioning device according to an embodiment of the present application is shown. The target positioning device is used for realizing the target positioning method, such as the target positioning methods shown in fig. 2 and 3.
As shown in fig. 5, the apparatus may include a coarse positioning module 501, a first processing module 502, a second processing module 503, a third processing module 504, and a positioning module 505, which are specifically as follows:
the coarse positioning module 501 is configured to perform coarse positioning on a target in a preset area, so as to obtain a coarse pose of the target;
the first processing module 502 is configured to obtain an image of a first marker in the preset area, and obtain a global pose of the first marker according to a rough pose of the target and the image of the first marker;
a second processing module 503, configured to obtain a pose of the first marker in a camera coordinate system according to the image of the first marker;
a third processing module 504, configured to obtain a pose of the target in the camera coordinate system, and obtain a relative pose between the first marker and the target according to the pose of the target in the camera coordinate system and the pose of the first marker in the camera coordinate system;
The positioning module 505 is configured to obtain a global pose of the target according to the global pose of the first marker and a relative pose between the first marker and the target.
In one possible implementation manner, the first processing module 502 is configured to:
obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
and acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
In a possible implementation manner, the apparatus further includes a fourth processing module, configured to:
performing three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
Registering the textured three-dimensional models of the M markers into the global point cloud map to obtain the semantic locating global map.
In a possible implementation manner, the second processing module 503 is further configured to:
inputting the image of the first marker into a preset model for processing to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of substitution, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on a background in initial training image data and/or performing one or more of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on the marker in the initial training image data.
The coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505 may be implemented by software, or may be implemented by hardware. Illustratively, the implementation of coarse positioning module 501 is described next by taking coarse positioning module 501 as an example. Similarly, the implementation of the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505 may refer to the implementation of the coarse positioning module 501.
Module as an example of a software functional unit, coarse positioning module 501 may comprise code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, coarse positioning module 501 may include code that runs on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (availability zone, AZ) or may be distributed in different AZs, each AZ comprising a data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (virtual private cloud, VPC) or in multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.
Module as an example of a hardware functional unit, coarse positioning module 501 may include at least one computing device, such as a server or the like. Alternatively, coarse positioning module 501 may be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD), etc. The PLD may be implemented as a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof.
The coarse localization module 501 may include multiple computing devices that may be distributed in the same region or may be distributed in different regions. The coarse positioning module 501 may include multiple computing devices that may be distributed among the same AZ or may be distributed among different AZ. Likewise, the coarse positioning module 501 may include multiple computing devices distributed in the same VPC or may be distributed among multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, in other embodiments, the coarse positioning module 501 may be used to perform any step in the target processing method, and the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505 may be used to perform any step in the target positioning method, where the steps that the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505 are responsible for implementing may be specified as needed, and all functions of the target positioning device are implemented by implementing different steps in the target positioning method by the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505, respectively.
It should be understood that the division of the modules in the above respective devices is only a division of a logic function, and may be fully or partially integrated into one physical entity or may be physically separated when actually implemented. Furthermore, the modules in the object localization apparatus may be implemented in the form of processor-invoked software; the object localization device comprises, for example, a processor, which is connected to a memory, in which instructions are stored, the processor invoking the instructions stored in the memory for performing any of the above methods or for performing the functions of the modules of the device, wherein the processor is, for example, a general purpose processor, such as a central processing unit (central processing unit, CPU) or microprocessor, and the memory is either an internal memory of the device or an external memory of the device. Alternatively, the modules in the apparatus may be implemented in the form of hardware circuitry, some or all of which may be implemented by the design of hardware circuitry, which may be understood as one or more processors; for example, in one implementation, the hardware circuit is an application-specific integrated circuit (ASIC), and the functions of some or all of the above units are implemented by the design of the logic relationships of the elements within the circuit; for another example, in another implementation, the hardware circuit may be implemented by a programmable logic device (programmable logic device, PLD), for example, a field programmable gate array (field programmable gate array, FPGA), which may include a large number of logic gates, and the connection relationship between the logic gates is configured by a configuration file, so as to implement the functions of some or all of the above units. All modules of the above device may be realized in the form of processor calling software, or in the form of hardware circuits, or in part in the form of processor calling software, and in the rest in the form of hardware circuits.
Referring to fig. 6, a schematic hardware structure of a computing device according to an embodiment of the present application is shown. As shown in fig. 6, the computing device 600 includes: bus 602, processor 604, memory 606, and communication interface 608. The processor 604, the memory 606, and the communication interface 608 communicate via the bus 602. Computing device 600 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 600.
Bus 602 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 6, but not only one bus or one type of bus. Bus 602 may include a path to transfer information between various components of computing device 600 (e.g., memory 606, processor 604, communication interface 608).
The processor 604 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 606 may include volatile memory (RAM), such as random access memory (random access memory). The processor 604 may also include non-volatile memory (ROM), such as read-only memory (ROM), flash memory, a mechanical hard disk (HDD), or a solid state disk (solid state drive, SSD).
The memory 606 has stored therein executable program code that is executed by the processor 604 to implement the functions of the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505, respectively, to thereby implement the target positioning method. That is, the memory 606 has instructions stored thereon for performing the object localization method.
Communication interface 608 enables communication between computing device 600 and other devices or communication networks using transceiver modules such as, but not limited to, network interface cards, transceivers, and the like.
It should be noted that although computing device 600 shown in fig. 6 illustrates only bus 602, processor 604, memory 606, and communication interface 608, those skilled in the art will appreciate that computing device 600 also includes other components necessary to achieve proper operation in a particular implementation. Also, as will be appreciated by those of skill in the art, the computing device 600 may also include hardware devices that implement other additional functions, as desired. Furthermore, those skilled in the art will appreciate that computing device 600 may also include only the necessary components to implement embodiments of the present application, and not necessarily all of the components shown in FIG. 6.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 7, the cluster of computing devices includes at least one computing device 600. The same instructions for performing the target positioning method may be stored in the memory 606 in one or more computing devices 600 in the computing device cluster.
In some possible implementations, the memory 606 of one or more computing devices 600 in the computing device cluster may also each have stored therein a portion of instructions for performing the object localization method. In other words, a combination of one or more computing devices 600 may collectively execute instructions for performing a target positioning method.
It should be noted that the memory 606 in different computing devices 600 in the computing device cluster may store different instructions for performing part of the functions of the object positioning apparatus. That is, the instructions stored by the memory 606 in the different computing devices 600 may implement the functionality of one or more of the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. The two computing devices are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of coarse positioning module 501 are stored in memory in the first computing device. Meanwhile, instructions to perform the functions of the first processing module 502, the second processing module 503, the third processing module 504, and the positioning module 505 are stored in a memory in the second computing device.
Embodiments also provide a computer readable storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform one or more steps of any of the methods described above.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.
It should be understood that in the description of the present application, unless otherwise indicated, "/" means that the associated object is an "or" relationship, e.g., a/B may represent a or B; wherein A, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the division of the unit is merely a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a read-only memory (ROM), or a random-access memory (random access memory, RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a digital versatile disk (digital versatile disc, DVD), or a semiconductor medium, such as a Solid State Disk (SSD), or the like.
The foregoing is merely a specific implementation of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the protection scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A method of locating a target, comprising:
coarsely positioning a target in a preset area to obtain the coarse pose of the target;
acquiring an image of a first marker in the preset area, and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker;
obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
acquiring the pose of the target under the camera coordinate system, and acquiring the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
and obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
2. The method of claim 1, wherein the obtaining the global pose of the first marker from the coarse pose of the target and the image of the first marker comprises:
obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
and acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
3. The method according to claim 2, wherein the method further comprises:
performing three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
registering the textured three-dimensional models of the M markers into the global point cloud map to obtain the semantic locating global map.
4. A method according to any one of claims 1 to 3, wherein the deriving the pose of the first marker in the camera coordinate system from the image of the first marker comprises:
inputting the image of the first marker into a preset model for processing to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of substitution, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on a background in initial training image data and/or performing one or more of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on the marker in the initial training image data.
5. A target positioning device, comprising:
the coarse positioning module is used for performing coarse positioning on the target in the preset area to obtain the coarse pose of the target;
the first processing module is used for acquiring an image of a first marker in the preset area and obtaining a global pose of the first marker according to the rough pose of the target and the image of the first marker;
The second processing module is used for obtaining the pose of the first marker under a camera coordinate system according to the image of the first marker;
the third processing module is used for acquiring the pose of the target under the camera coordinate system and obtaining the relative pose between the first marker and the target according to the pose of the target under the camera coordinate system and the pose of the first marker under the camera coordinate system;
and the positioning module is used for obtaining the global pose of the target according to the global pose of the first marker and the relative pose between the first marker and the target.
6. The apparatus of claim 5, wherein the first processing module is configured to:
obtaining a local map of the position of the target according to the rough pose of the target and the global point cloud map of the preset area;
obtaining a semantic locating local map according to a local map of the position of the target and a semantic locating global map, wherein the semantic locating global map comprises global poses of M markers, the semantic locating local map comprises global poses of N markers, the N markers are markers in the M markers, and M and N are positive integers;
And acquiring the global pose of the first marker from the semantic locating local map according to the image of the first marker.
7. The apparatus of claim 6, further comprising a fourth processing module to:
performing three-dimensional reconstruction on M markers in the image according to the image of the first marker to obtain a three-dimensional model with textures of the M markers;
registering the textured three-dimensional models of the M markers into the global point cloud map to obtain the semantic locating global map.
8. The apparatus of any one of claims 5 to 7, wherein the second processing module is further configured to:
inputting the image of the first marker into a preset model for processing to obtain the pose of the first marker under a camera coordinate system, wherein training data of the preset model is obtained by performing one or more of substitution, gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on a background in initial training image data and/or performing one or more of Gaussian blur, translation, clipping, contrast conversion, gamma conversion, amplification and reduction on the marker in the initial training image data.
9. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory; wherein the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1 to 4.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any one of claims 1 to 4.
11. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method according to any of claims 1 to 4.
CN202210980993.8A 2022-08-16 2022-08-16 Target positioning method, related system and storage medium Pending CN117635721A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210980993.8A CN117635721A (en) 2022-08-16 2022-08-16 Target positioning method, related system and storage medium
PCT/CN2023/086234 WO2024036984A1 (en) 2022-08-16 2023-04-04 Target localization method and related system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210980993.8A CN117635721A (en) 2022-08-16 2022-08-16 Target positioning method, related system and storage medium

Publications (1)

Publication Number Publication Date
CN117635721A true CN117635721A (en) 2024-03-01

Family

ID=89940526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210980993.8A Pending CN117635721A (en) 2022-08-16 2022-08-16 Target positioning method, related system and storage medium

Country Status (2)

Country Link
CN (1) CN117635721A (en)
WO (1) WO2024036984A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118038008A (en) * 2024-04-15 2024-05-14 武汉人云智物科技有限公司 Hydropower plant personnel positioning method and system based on ptz multi-camera linkage

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102967305B (en) * 2012-10-26 2015-07-01 南京信息工程大学 Multi-rotor unmanned aerial vehicle pose acquisition method based on markers in shape of large and small square
CN111369622B (en) * 2018-12-25 2023-12-08 中国电子科技集团公司第十五研究所 Method, device and system for acquiring world coordinate position of camera by virtual-real superposition application
US12007784B2 (en) * 2020-03-26 2024-06-11 Here Global B.V. Method and apparatus for self localization
CN114581509A (en) * 2020-12-02 2022-06-03 魔门塔(苏州)科技有限公司 Target positioning method and device
CN112836698A (en) * 2020-12-31 2021-05-25 北京纵目安驰智能科技有限公司 Positioning method, positioning device, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118038008A (en) * 2024-04-15 2024-05-14 武汉人云智物科技有限公司 Hydropower plant personnel positioning method and system based on ptz multi-camera linkage

Also Published As

Publication number Publication date
WO2024036984A1 (en) 2024-02-22

Similar Documents

Publication Publication Date Title
US11852729B2 (en) Ground intensity LIDAR localizer
Asvadi et al. 3D object tracking using RGB and LIDAR data
US11651553B2 (en) Methods and systems for constructing map data using poisson surface reconstruction
Alonso et al. Accurate global localization using visual odometry and digital maps on urban environments
CN111417871A (en) Iterative closest point processing for integrated motion estimation using high definition maps based on lidar
WO2021046716A1 (en) Method, system and device for detecting target object and storage medium
CN108089572A (en) For the algorithm and infrastructure of steady and effective vehicle location
WO2020168464A1 (en) Local sensing based autonomous navigation, and associated systems and methods
WO2021254019A1 (en) Method, device and system for cooperatively constructing point cloud map
Christensen et al. Autonomous vehicles for micro-mobility
CN114063090A (en) Mobile equipment positioning method and device and mobile equipment
Jiménez et al. Improving the lane reference detection for autonomous road vehicle control
CN114459467B (en) VI-SLAM-based target positioning method in unknown rescue environment
Hara et al. Vehicle localization based on the detection of line segments from multi-camera images
WO2024036984A1 (en) Target localization method and related system, and storage medium
Aggarwal GPS-based localization of autonomous vehicles
US11561553B1 (en) System and method of providing a multi-modal localization for an object
CN113227713A (en) Method and system for generating environment model for positioning
Jiang et al. Precise vehicle ego-localization using feature matching of pavement images
CN114565669A (en) Method for fusion positioning of field-end multi-camera
Nowicki et al. Laser-based localization and terrain mapping for driver assistance in a city bus
Atanasyan et al. Improving self-localization using CNN-based monocular landmark detection and distance estimation in virtual testbeds
Li et al. Intelligent vehicle localization and navigation based on intersection fingerprint roadmap (IRM) in underground parking lots
Gujarathi et al. Design and Development of Autonomous Delivery Robot
Guo Robot localization and scene modeling based on RGB-D sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication