CN114782626A

CN114782626A - Transformer substation scene mapping and positioning optimization method based on laser and vision fusion

Info

Publication number: CN114782626A
Application number: CN202210393283.5A
Authority: CN
Inventors: 屠铭扬; 郭志民; 曾平良; 田杨阳; 吴秋轩; 毛万登; 经韬; 刘昊; 库永恒; 孟秦源; 张波涛; 袁少光; 刘善峰; 梁允
Original assignee: State Grid Corp of China SGCC; Hangzhou Dianzi University; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Hangzhou Dianzi University; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-07-22
Anticipated expiration: 2042-04-14
Also published as: CN114782626B

Abstract

The invention discloses a transformer substation scene mapping and positioning optimization method based on laser and vision fusion, which comprises the steps of firstly, acquiring single line laser radar, an inertial sensor IMU, a milemeter and binocular image data, and preprocessing the data; then, local positioning and map building are carried out, and a sub-map is constructed; while carrying out laser thread positioning and drawing construction, synchronously carrying out processing on binocular images, namely a visual thread, realizing characteristic point tracking and judging key frames; then carrying out target detection on the obtained key frame; finally, generating a semantic landmark and projecting the semantic landmark to a sub-map; and optimizing the sub-map, and performing global optimization. The method combines the advantages of a positioning algorithm based on graph optimization and deep learning, can obtain a complete navigation map for stable positioning, greatly shortens the characteristic matching time, greatly improves the positioning stability of the transformer substation in a complex operation and maintenance scene, and enables the navigation in the transformer substation scene to be faster and more reliable.

Description

Transformer substation scene mapping and positioning optimization method based on laser and vision fusion

Technical Field

The invention relates to the field of SLAM in a complex scene of laser and vision fusion, and aims to efficiently and stably establish a diagram and position a transformer substation in a complex environment.

Background

With the economic development of China and the improvement of the living standard of people, the electricity consumption of society is continuously improved, and the power industry is rapidly developed. The links of power systems such as power transformation, power transmission, power distribution and the like are very important links in a power grid. Therefore, advanced field operation and maintenance technology is urgently needed for the transformer substation. This has put forward higher requirement to the construction of transformer substation, and the intellectuality of transformer substation reforms into for the hot problem.

The inspection of the substation equipment is an important method for ensuring the normal operation of the substation, the type and model of the substation equipment, and the position and the posture of the substation equipment are mostly manually identified at present, and some difficulties and defects exist in the process:

(1) the manual inspection has the defects of high labor intensity, low working efficiency and inaccuracy in detection quality and data. Due to the characteristics of high voltage, dense station equipment, large volume and heavy weight, the development of manual live working is difficult. And the transformer substation can cause certain harm to human bodies and is not beneficial to human health.

(2) If carry out the outage to equipment when the manual work is patrolled and examined and handle to the personal safety of guarantee patrolling and examining personnel, nevertheless along with the integration and the centralization of transformer substation, the outage carries out off-line operation and maintenance to equipment and can cause very big influence to domestic power consumption and industrial power consumption.

(3) The transformer substation has the advantages of large occupied area, more equipment types, various equipment arrangement, and different equipment posture positions even if the equipment is of the same type in the same category.

Therefore, it is urgent to use the robot of patrolling and examining to replace artifical patrolling and examining to equipment in the station, is favorable to smart power grids's construction, can reduce staff's amount of labour on the one hand like this, reduces the manpower, and on the other hand also can improve the efficiency of transformer substation's fortune dimension and transformer substation.

In order to enable the robot to autonomously perform routing inspection in the substation, the SLAM (Simultaneous Localization And Mapping) technology is a core technology. The technology is a system for generating positioning and scene map information of the position and the posture of the robot by calculating data acquired by equipment carried by the robot. Currently mainstream SLAM systems can be divided into two major categories, laser SLAM and visual SLAM. The laser SLAM is mature through development for many years, and the core of the method is to calculate and match point cloud data obtained by continuous scanning of a laser radar, calculate the relative movement distance and posture change of the laser radar, realize the positioning of the robot and further complete the construction of a positioning map. The laser SLAM scheme may use a single line lidar or a multi-line lidar. The single-line laser radar can well complete the task of 2D SLAM, and the existing algorithms such as gmaping represented by a filter scheme or Cartographer represented by graph optimization are excellent, but the disadvantages are also obvious: because the 2D SLAM naturally lacks information of most scenes due to the limitation of its dimension, its application is very limited, and it is generally used for indoor positioning. The defects of a single-line radar are compensated in the aspect of dimensionality by the multi-line laser radar, the existing open source scheme is rich, such as LOAM, LEGO-LOAM and the like, but with the increase of point cloud data, the processing of a large amount of point cloud data causes great load on a small computer, and the high selling price of the multi-line laser radar limits a lot of low-cost projects to adopt the scheme. The thinking of the visual SLAM algorithm is different from that of the laser SLAM algorithm, but because the input of the visual SLAM algorithm is an image and is not dense point cloud, the characteristic point extraction needs to be carried out on the image, the requirement on a computer is higher, the existing algorithm is like ORB-SLAM series, Vins-mono and the like, but the precision of the existing visual algorithm cannot reach the degree of the laser algorithm, and the established sparse map is difficult to be used for navigation. Therefore, the multi-sensor fusion scheme of vision and laser is the current research hotspot.

Disclosure of Invention

In order to realize that the transformer substation inspection robot inspects various types of equipment of the transformer substation, ensure that the inspection robot can accurately, quickly and stably position and navigate, and consider the actual complex operation and maintenance environment of the transformer substation, the two-dimensional laser radar has weak perception on objects such as fences, equipment supports and the like of an equipment area, the phenomenon that laser penetrates through the equipment from gaps occurs, and great influence is caused on readability of subsequent construction drawings and positioning stability. Most of the existing two-dimensional algorithm frameworks are based on ideal environments, such as indoor environments or outdoor scenes with simple edge features, and can obtain ideal positioning and mapping effects. However, the algorithms have some disadvantages in the complex operation and maintenance environment of the substation, such as a slow positioning speed, and even a positioning failure. The invention provides a visual fusion SLAM scheme based on a single line laser radar, solves the problems that the single line laser radar cannot sense a three-dimensional barrier and laser penetrates through the single line laser radar, and optimizes the actual complex scene of a transformer substation.

Based on the challenges and problems, the robot autonomous inspection task under the complex operation and maintenance scene of the transformer substation is realized efficiently on the basis of low cost. The invention provides a SLAM scheme based on a single line laser radar and integrating a binocular camera. In order to meet the requirement of the robot in the inspection of the transformer substation, the grid map is constructed for the transformer substation scene in a single-line laser radar map optimization mode, and the grid map can be positioned and navigated. The binocular camera processes image data while the single-line laser radar operates, the target detection thread reconstructs a three-dimensional model of the substation equipment by identifying ORB feature points extracted by the tracking thread to assist the robot in navigation and obstacle avoidance, and a grid map constructed by the laser thread is compensated. The method has the key point that map distortion and unstable positioning caused by insufficient dimensionality in a transformer substation scene are compensated and optimized by processing single-line laser data and using a deep learning method under the assistance of image data.

The invention realizes the functions: the inspection robot used by the invention is provided with a single line laser radar and a binocular camera, and an inertial sensor and a wheel speed meter are arranged at the physical center of the robot. According to the invention, the transformer substation scene is scanned by using the airborne single-line laser radar and the binocular camera, the three-dimensional reconstruction of common transformer substation equipment is realized through the camera, the radar data is compensated and enhanced aiming at the complex scene of the transformer substation, and the processed sub-map is used for constructing the global map for stable positioning and navigation.

A transformer substation scene mapping and positioning optimization method based on laser and vision fusion comprises the following steps:

step 1: acquiring single-line laser radar, an inertial sensor IMU, a milemeter and binocular image data for preprocessing;

step 2: carrying out local positioning and map building to build a sub map;

and step 3: while carrying out laser thread positioning and drawing construction, synchronously carrying out processing on binocular images, namely a visual thread, realizing characteristic point tracking and judging key frames;

and 4, step 4: performing target detection on the key frame obtained in the step 3;

and 5: generating a semantic landmark and projecting the semantic landmark to a sub-map;

step 6: and optimizing the sub-map and performing global optimization.

The specific method of the step 1 is as follows:

the substation inspection robot scans the environment of the whole substation through the carried single-line laser radar to obtain point cloud information, namely point cloud data, of a single plane of the substation. When the inspection robot moves, the wheel speed meter and the inertial sensor IMU which are corrected in advance through a calibration algorithm are used for obtaining the wheel speed and IMU information of the inspection robot, and the real-time speed and posture of the robot are obtained through calculation and used for removing the motion distortion of the laser radar. The calibrated binocular camera is used for reducing the influence of image distortion on pose estimation, and the binocular camera is started when the inspection robot is started for acquiring binocular image data. And then, filtering the obtained binocular image, so that the feature points in the binocular image can be better found by a subsequent algorithm.

The specific method of the step 2 is as follows:

and inputting the speed and attitude data of the robot obtained by the wheel speed meter and the inertial sensor into a Pose extractor, and solving to obtain the predicted Pose of the inspection robot. And inputting the laser radar data without motion distortion and the pose estimated by the pose inference engine into a scanning matcher Scan Matching, and completing optimization calculation by using a Ceres library to obtain an observed value of the pose of the robot. The observation value is fed back to the pose inference engine to correct the estimation value, and is provided to the Motion Filter to judge whether the robot is in a Motion state. To avoid inserting too much scan frame data per sub-map, once the scan matcher generates a new observation pose, the motion filter calculates its variance from the previous observation pose. And if the change exceeds a set change threshold value, the motion filter matches the current observation pose with the current predicted pose through a correlation scanning matching CSM method to obtain real pose data of the robot, performs optimization solution through nonlinear least squares to obtain optimal matching, and obtains actual coordinates of each laser point in the point cloud data at the moment. When a sub map starts to be maintained, the coordinate origin of the sub map is the coordinate origin of the real pose of the first frame. And converting the laser point cloud into a local coordinate system of the current sub-map, inserting the laser point cloud into the sub-map which is maintained, and outputting a result, wherein the sub-map comprises the position and the laser scanning data of the inspection robot with time information. And continuously inserting data frames while optimizing the pose to update the sub-map, scoring the real pose of each frame inserted into the sub-map, obtaining the optimal pose when the sub-map is created, and considering that the sub-map is created when the sub-map is not updated any more. In addition, an interface is reserved for visual fusion in the sub-map construction process, and the visual thread can obtain the permission for modifying the sub-map through the interface.

The specific method of step 3 is as follows:

the binocular image processing method based on the laser thread positioning comprises the steps that the laser thread positioning and the image building are carried out, meanwhile, the binocular image processing, namely the vision thread, is carried out synchronously, and the vision thread comprises a tracking thread and a target detection thread. When a new frame of image is input, the tracking thread is entered. In the tracking thread, ORB (organized FAST and Rotated BRIEF) feature extraction is firstly carried out on the left image and the right image of the frame image, after feature points are extracted, matching is carried out on the left image and the right image once, and only the feature points which are successfully matched are reserved for depth calculation. And repeating the step after the next frame of image enters the tracking thread, matching the obtained characteristic points with the characteristic points of the previous frame of image, and then optimizing the current camera pose by using the matched characteristic points. And after the pose of the camera is obtained, converting the pose into a robot pose, aligning the robot pose with a robot track obtained by laser scanning calculation, and obtaining the coordinates of each point cloud under the world coordinates. In order to ensure the quality of key frame tracking and avoid introducing excessive information redundancy, only when the point cloud number contained in a certain frame image exceeds a set point cloud number threshold and the repeatability of the point cloud is lower than the set repeatability threshold, the point cloud is regarded as a frame key frame.

The specific method of the step 4 is as follows:

and performing target detection on the key frames extracted from the tracking thread, namely performing target detection on the key frames. The target detection algorithm uses FASTER R-CNN, which is trained using the dataset of medium and large near-earth devices in the substation for detecting keyframes. When it detects a device in the training dataset, it outputs the classification result and the regression result, i.e. the label and coordinates of the prediction box.

The specific method of step 5 is as follows:

when the target detection program detects the device in a certain key frame, the target detection program performs target detection on ten adjacent frame data before and after the key frame to perform false judgment detection. If the same device can be detected by eight frames of images, the target detection is considered to be successful, otherwise, the detection is considered to be invalid. Because the projection sizes of the cameras facing the object are different under different viewing angles, the sizes of the object detection frames in different frames can be changed along with the movement of the cameras. And when the target detection is successful, solving through the poses of the adjacent frames of the camera in the world coordinate system and the sizes and coordinates of the 2D detection frames of the detected equipment in the adjacent frames to obtain the 3D cube-form semantic landmark of the equipment in the world coordinate system. The semantic landmark is represented by a 9-dimensional vector, including the pose (x, y, z, roll, pitch, yaw) and size (length, width, height) of the device. The vision thread and the laser thread run simultaneously, the laser radar and the binocular camera are rigidly fixed on the inspection robot, the relative pose of the inspection robot is fixed, the motion tracks of the binocular camera and the laser radar are aligned through the timestamp, namely time information, and the calculated semantic landmarks are projected into the sub-map constructed by the laser thread through the reserved interface when the sub-map is constructed.

The specific method of step 6 is as follows:

by optimizing the sub-map in the local coordinate system, the matching speed is increased, the stability is improved and the map with strong readability is constructed under the condition that the feature matching is not influenced. The completed sub-map is composed of a plurality of patrol robot poses with time information and laser scanning data thereof, a set of coordinates of the laser scanning data of each pose in the completed sub-map is subjected to curve fitting of nonlinear least squares, a Ceres library is called in a curve fitting algorithm to perform optimization, so that single laser or multiple laser penetrating gaps do not appear in the map any more, the ground edge jumps, Gaussian filtering is performed on the sub-map, the map boundary is further blurred, and negative optimization of the map caused by over-rich characteristics on positioning is greatly reduced on the premise that global positioning is not influenced. Inserting the sub-map after fitting the edge into the global map, wherein the origin of the first sub-map is located on the origin of the global map, and using the optimal pose in the first sub-map as the initial pose of the second sub-map, further generating the second sub-map connected with the first sub-map on the global map, continuously circulating the process, and completing the construction of the global map in the complex operation and maintenance scene of the transformer substation by creating a large number of sub-maps until the program is finished, so as to finally obtain the grid map for stable navigation.

The invention has the beneficial effects that: the method provided by the invention provides a positioning and mapping scheme based on the map which is constructed according with human thinking in combination with the complex operation and maintenance scene of the transformer substation, combines the advantages of a positioning algorithm based on map optimization and deep learning, can obtain a complete navigation map which can be stably positioned, greatly shortens the characteristic matching time, greatly increases the positioning stability in the complex operation and maintenance scene of the transformer substation, and enables the navigation in the complex operation and maintenance scene of the transformer substation to be faster and more reliable.

Innovation points 1: through processing the field scanning laser data of the transformer substation area, the positioning and mapping optimization of the transformer substation in the complex operation and maintenance environment is realized. Different from the mainstream feature matching idea, the map boundary is fitted once by using the least square method under the complex operation and maintenance environment of the transformer substation, so that the noise caused by penetrating laser is reduced, the constructed sub-map is filtered, a large amount of unnecessary calculated amount is reduced, the matching speed is increased, and the condition of mismatching is reduced.

Innovation points 2: the method comprises the steps of scanning a transformer substation scene by using a binocular camera, carrying out target detection on the obtained key frames, solving the posture and the size of the equipment by detecting the size and the depth of a detection frame of the same object in the adjacent key frames under different visual angles, providing a new idea and a new method for subsequent visual obstacle avoidance, and laying a foundation for establishing a complete grid map under the complex environment of the transformer substation.

Innovation points 3: the device model obtained from the point cloud is projected to the grid map constructed by the laser, so that the stereoscopic map information which cannot be obtained by the single-line laser radar in the operation and maintenance environment of the transformer substation is compensated, and the reliability of the map is greatly improved.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

The objects and effects of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.

The purpose that the inspection robot replaces manpower to inspect under the complex operation and maintenance environment of the transformer substation is achieved, normal operation of the transformer substation is guaranteed, and better readability of a navigation map is achieved during follow-up manual navigation of the robot. The invention solves the problems of poor human readability, long positioning time and easy loss of the map constructed in a complex environment, and provides a method for realizing stable positioning in a transformer substation and constructing a navigation map with strong human readability. The implementation flow chart is shown in figure 1. The specific implementation steps are as follows:

step 1: the substation inspection robot scans the environment of the whole substation through the carried single-line laser radar to obtain point cloud information, namely point cloud data, of a single plane of the substation. When the inspection robot moves, the wheel speed meter and the inertial sensor IMU which are corrected in advance through a calibration algorithm are used for obtaining the wheel speed and IMU information of the inspection robot, and the real-time speed and posture of the robot are obtained through calculation and used for removing the motion distortion of the laser radar. The calibrated binocular camera is used for reducing the influence of image distortion on pose estimation, and the binocular camera is started when the inspection robot is started for acquiring binocular image data. And then, filtering the obtained binocular image, so that the feature points in the binocular image can be better found by a subsequent algorithm.

And 2, step: and inputting the speed and attitude data of the robot obtained by the wheel speed meter and the inertial sensor into a Pose inference device (Pose explicit) to solve and obtain the predicted Pose of the inspection robot. And inputting the laser radar data without motion distortion and the pose estimated by the pose inference engine into a scanning matcher (Scan Matching), and finishing optimization calculation by using a Ceres library to obtain an observed value of the pose of the robot. The observed value is fed back to the pose inference engine for correcting the estimated value on one hand, and is provided for a Motion Filter (Motion Filter) for judging whether the robot is in a Motion state or not on the other hand. To avoid inserting too much scan frame data per sub-map, once the scan matcher generates a new observation pose, the motion filter calculates its variance from the previous observation pose. And if the change exceeds a set change threshold, the motion filter obtains real pose data of the robot by matching the observation pose at the moment through a Correlation Scanning Matching (CSM) method and the prediction pose at the moment, and performs optimization solution through nonlinear least squares to obtain optimal matching, so as to obtain the actual coordinates of each laser point in the point cloud data at the moment. When a sub map starts to be maintained, the coordinate origin of the sub map is the coordinate origin of the real pose of the first frame. And converting the laser point cloud into a local coordinate system of the current sub-map, inserting the laser point cloud into the sub-map which is maintained, and outputting a result, wherein the sub-map comprises the position and the laser scanning data of the inspection robot with time information. And continuously inserting data frames while optimizing the pose to update the sub-map, scoring the real pose of each frame inserted into the sub-map, obtaining the optimal pose when the sub-map is created, and considering that the sub-map is created when the sub-map is not updated any more. In addition, an interface is reserved for visual fusion in the sub-map construction process, and the visual thread can obtain the permission of modifying the sub-map through the interface.

And 3, step 3: the binocular image processing method based on the laser thread positioning comprises the steps that the laser thread positioning and the image building are carried out, meanwhile, the binocular image processing, namely the vision thread, is carried out synchronously, and the vision thread comprises a tracking thread and a target detection thread. When a new frame of image is input, the tracking thread is entered. In a tracking thread, ORB (ordered FAST and ordered BRIEF) feature extraction is firstly carried out on the left image and the right image of the frame image, ORB feature points are obtained by combining a FAST feature detection method and a BRIEF feature descriptor for optimization, and rapid feature extraction and feature description can be carried out on the image. The ORB feature points have scale and rotation invariance, so they are robust. And after the characteristic points are extracted, matching the left image and the right image once, only keeping the characteristic points successfully matched, and performing depth calculation. And after the next frame of image enters a tracking thread, repeating the step, matching the obtained feature points with the feature points of the previous frame of image, and then optimizing the current camera pose by using the matched feature points. And after the pose of the camera is obtained, converting the pose into a robot pose, aligning the robot pose with a robot track obtained by laser scanning calculation, and obtaining the coordinates of each point cloud under the world coordinates. In order to ensure the quality of key frame tracking and avoid introducing excessive information redundancy, only when the number of point clouds included in a certain frame of image exceeds a set point cloud number threshold and the repeatability of the point clouds is lower than a set repeatability threshold, the point clouds are regarded as a frame of key frame.

And 4, step 4: in view of hardware performance of the inspection robot, if point cloud data of all frames are detected, a great load is inevitably caused to system calculation. Based on the consideration of saving computing resources, the method only carries out target detection on the key frames extracted from the tracking thread, namely the target detection thread. The target detection algorithm uses FASTER R-CNN, which is trained using the dataset of medium and large sized ground proximity devices in the substation, for detecting keyframes. When it detects a device in the training dataset, it outputs the classification result and the regression result, i.e. the label and coordinates of the prediction box.

And 5: due to the high frequency of the camera, the movement of the inspection robot among a plurality of frames of images is not too large, and a plurality of objects which are the same under different visual angles exist in the images. When the target detection program detects the device in a certain key frame, the target detection program performs target detection on ten adjacent frames of data before and after the key frame to perform misjudgment detection. If the same device can be detected by eight frames of images, the target detection is considered to be successful, otherwise, the detection is considered to be invalid. Because the projection sizes of the cameras facing the object are different under different viewing angles, the sizes of the object detection frames in different frames can be changed along with the movement of the cameras. And when the target detection is successful, solving through the poses of the adjacent frames of the camera in the world coordinate system and the sizes and coordinates of the 2D detection frames of the detected equipment in the adjacent frames to obtain the 3D cube-form semantic landmark of the equipment in the world coordinate system. The semantic landmark is represented by a 9-dimensional vector, the pose (x, y, z, roll, pitch, yaw) and the size (length, width, height) of the device are included, and the pose information can be used for subsequent visual obstacle avoidance and a satisfactory effect is obtained. The vision thread and the laser thread run simultaneously, the laser radar and the binocular camera are rigidly fixed on the inspection robot, the relative pose of the inspection robot is fixed, the motion tracks of the binocular camera and the laser radar are aligned through the timestamp, namely time information, and the calculated semantic landmarks are projected into the sub-map constructed by the laser thread through the reserved interface when the sub-map is constructed.

And 6: in consideration of the actual complex operation and maintenance environment of the transformer substation, the two-dimensional laser radar has weak perception on objects such as fences and equipment supports in an equipment area, the phenomenon that laser penetrates through equipment from gaps occurs, and the readability and the positioning stability of subsequent drawing construction are greatly affected. Most of the existing two-dimensional algorithm frameworks are based on ideal environments, such as indoor environments or outdoor scenes with simple edge features, and can obtain ideal positioning and mapping effects. However, the algorithms have some defects in the complex operation and maintenance environment of the substation, such as slow positioning speed and even positioning failure. Based on the above description, the sub-maps in the local coordinate system are optimized, so that the matching speed is increased, the stability is improved and the map with strong readability is constructed under the condition that the feature matching is not influenced. The finished sub-map is formed by a plurality of patrol robot poses with time information and laser scanning data thereof, the set of coordinates of the laser scanning data of each pose in the finished sub-map is subjected to curve fitting of nonlinear least squares once, a Ceres library is called in a curve fitting algorithm to perform optimization, so that single-beam laser or multiple-beam laser penetrating gaps do not occur in the map any more, the ground edge is subjected to jump, Gaussian filtering is performed on the sub-map, the map boundary is further blurred, and the negative optimization of the map caused by over-rich characteristics on the positioning is greatly reduced on the premise of not influencing the global positioning. Inserting the sub-map after fitting the edge into the global map, wherein the origin of the first sub-map is located on the origin of the global map, and using the optimal pose in the first sub-map as the initial pose of the second sub-map, further generating the second sub-map connected with the first sub-map on the global map, continuously circulating the process, and completing the construction of the global map in the complex operation and maintenance scene of the transformer substation by creating a large number of sub-maps until the program is finished, so as to finally obtain the grid map for stable navigation.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A transformer substation scene mapping and positioning optimization method based on laser and vision fusion is characterized by comprising the following steps:

step 2: carrying out local positioning and map building to build a sub map;

and step 3: while carrying out laser thread positioning and mapping, synchronously carrying out binocular image processing, namely a visual thread, realizing characteristic point tracking and judging key frames;

and 6: and optimizing the sub-map, and performing global optimization.

2. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion of claim 1 is characterized in that the specific method in step 1 is as follows:

the transformer substation inspection robot scans the environment of the whole transformer substation through the carried single-line laser radar to obtain point cloud information, namely point cloud data, of a single plane of the transformer substation; when the inspection robot moves, a wheel speed meter and an inertial sensor IMU (inertial measurement unit) which are corrected in advance through a calibration algorithm are used for obtaining the wheel speed and IMU information of the inspection robot, and the real-time speed and posture of the robot are obtained through calculation and used for removing the motion distortion of a laser radar; the calibrated binocular camera is used for reducing the influence of image distortion on pose estimation, and the binocular camera is started when the inspection robot is started and is used for acquiring binocular image data; and then, filtering the obtained binocular image to better find the characteristic points in the binocular image by a subsequent algorithm.

3. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion as claimed in claim 2, wherein the specific method in step 2 is as follows:

inputting the speed and attitude data of the robot obtained by a wheel speed meter and an inertial sensor into a position and attitude reasoner (Pose extractor), and solving to obtain the predicted position and attitude of the inspection robot; inputting the laser radar data without motion distortion and the pose estimated by the pose inference device into a Scan matcher, and completing optimization calculation by using a Ceres library to obtain an observed value of the pose of the robot; the observed value is fed back to the pose inference engine for correcting the estimated value on one hand, and is provided to a Motion Filter for judging whether the robot is in a Motion state or not on the other hand; in order to avoid inserting too much scanning frame data into each sub map, once the scanning matcher generates a new observation pose, the motion filter calculates the variation of the new observation pose and the previous observation pose; if the change exceeds a set change threshold value, the motion filter matches the current observation pose with the current prediction pose through a correlation scanning matching CSM method to obtain real pose data of the robot, and performs optimization solution through nonlinear least squares to obtain optimal matching, so as to obtain actual coordinates of each laser point in the point cloud data at the moment; when a sub map starts to be maintained, the coordinate origin of the sub map is the coordinate origin of the real pose of the first frame; converting the laser point cloud into a local coordinate system of a current sub-map, inserting the laser point cloud into the sub-map which is being maintained, and outputting a result, wherein the sub-map comprises the position and the laser scanning data of the inspection robot with time information; continuously inserting data frames while optimizing the pose to update the sub-map, scoring the real pose of each frame inserted into the sub-map, obtaining the optimal pose when the sub-map is created, and considering that the sub-map is created when the sub-map is not updated any more; in addition, an interface is reserved for visual fusion in the sub-map construction process, and the visual thread can obtain the permission of modifying the sub-map through the interface.

4. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion of claim 3 is characterized in that the specific method in step 3 is as follows:

the method comprises the following steps that while laser thread positioning and drawing construction are carried out, binocular images are processed synchronously, namely a visual thread is carried out, and the visual thread comprises a tracking thread and a target detection thread; when a new frame of image is input, a tracking thread is entered; in a tracking thread, firstly, carrying out ORB (organized FAST and Rotated BRIEF) feature extraction on a left image and a right image of the frame of image, after feature points are extracted, carrying out primary matching on the left image and the right image, only reserving feature points successfully matched, and carrying out depth calculation; after the next frame of image enters a tracking thread, repeating the step, matching the obtained feature points with the feature points of the previous frame of image, and then optimizing the current camera pose by using the matched feature points; after the pose of the camera is obtained, the pose of the camera is converted into the pose of the robot, and the pose of the robot is aligned with the track of the robot obtained by laser scanning calculation, so that the coordinates of each point cloud under world coordinates are obtained; in order to ensure the quality of key frame tracking and avoid introducing excessive information redundancy, only when the point cloud number contained in a certain frame image exceeds a set point cloud number threshold and the repeatability of the point cloud is lower than the set repeatability threshold, the point cloud is regarded as a frame key frame.

5. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion as claimed in claim 4, wherein the specific method in step 4 is as follows:

performing target detection on the key frames extracted from the tracking thread, namely performing target detection thread; the target detection algorithm uses FASTER R-CNN, trains the FASTER R-CNN by using a data set of medium and large near-ground equipment in a transformer substation, and is used for detecting key frames; when it detects a device in the training dataset, it outputs the classification result and the regression result, i.e. the label and coordinates of the prediction box.

6. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion as claimed in claim 5, wherein the specific method in step 5 is as follows:

when a target detection program detects equipment in a certain key frame, target detection is carried out on ten adjacent frame data before and after the key frame so as to carry out misjudgment detection; if the same equipment can be detected by eight frames of images, the target detection is considered to be successful, otherwise, the detection is considered to be invalid; because the projection sizes of the camera facing the object are different under different visual angles, the sizes of the object detection frames in different frames can be changed along with the movement of the camera; when the target detection is successful, a 3D cube-form semantic landmark of the equipment in the world coordinate system is obtained through the pose of the camera in the adjacent frame in the world coordinate system and the size and coordinate solution of the 2D detection frame of the equipment to be detected in the adjacent frame; the semantic landmark is represented by a 9-dimensional vector, and comprises the pose (x, y, z, roll, pitch, yaw) and the size (length, width, height) of the device; the vision thread and the laser thread run simultaneously, the laser radar and the binocular camera are rigidly fixed on the inspection robot, the relative pose of the inspection robot is fixed, the motion tracks of the binocular camera and the laser radar are aligned through the timestamp, namely time information, and the calculated semantic landmarks are projected into the sub-map constructed by the laser thread through the reserved interface when the sub-map is constructed.

7. The transformer substation scene mapping and positioning optimization method based on laser and vision fusion of claim 6 is characterized in that the specific method in step 6 is as follows:

by optimizing the sub-map in the local coordinate system, the matching speed is increased, the stability is improved and a map with strong readability is constructed under the condition that the feature matching is not influenced; the finished sub-map is composed of a plurality of patrol robot poses with time information and laser scanning data thereof, a set of coordinates of the laser scanning data of each pose in the finished sub-map is subjected to curve fitting of nonlinear least squares, a Ceres library is called in a curve fitting algorithm to perform optimization, so that single-beam laser or multiple-beam laser penetrating gaps do not occur in the map any more, the ground edges are subjected to jump, Gaussian filtering is performed on the sub-map, the map boundary is further blurred, and negative optimization of the map caused by over-abundant characteristics on positioning is greatly reduced on the premise of not influencing global positioning; inserting the sub-map after fitting the edge into the global map, wherein the origin of the first sub-map is located on the origin of the global map, and using the optimal pose in the first sub-map as the initial pose of the second sub-map, further generating the second sub-map connected with the first sub-map on the global map, continuously circulating the process, and completing the construction of the global map in the complex operation and maintenance scene of the transformer substation by creating a large number of sub-maps until the program is finished, so as to finally obtain the grid map for stable navigation.