CN114913224A

CN114913224A - Composition method for mobile robot based on visual SLAM

Info

Publication number: CN114913224A
Application number: CN202110178063.6A
Authority: CN
Inventors: 王培建; 张新远; 王洁; 郭云雷; 向可
Original assignee: Zhejiang Sunny Optical Intelligent Technology Co Ltd
Current assignee: Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2022-08-16

Abstract

Disclosed is a composition method for a mobile robot based on a visual SLAM, which includes: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame; acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference; generating a sequence of key frames based on SLAM visual tracking; and generating a target map based on the corrected depth point clouds and the poses of all key frames in the key frame sequence. The composition method corrects the depth point cloud by fusing poses and transforming based on inter-frame poses, and improves the map construction and positioning precision.

Description

Composition method for mobile robot based on visual SLAM

Technical Field

The present application relates to the field of visual SLAM, and more particularly, to a composition method for a mobile robot based on visual SLAM, a positioning method for a mobile robot based on visual SLAM, a composition apparatus for a mobile robot based on visual SLAM, a positioning apparatus for a mobile robot based on visual SLAM, and a robot.

Background

SLAM technology (Simultaneous localization and Mapping), i.e., a synchronous positioning and Mapping technology, refers to: under the condition of no prior information, a target environment map is constructed by a subject carrying a specific sensor in the process of movement, and the self pose is estimated to realize autonomous positioning.

According to sensor classification, SLAM mainly includes laser SLAM and visual SLAM, wherein, when the sensor is mainly a laser radar, it is called laser SLAM; when the sensor is primarily an image capture device (e.g., a camera), it is referred to as a visual SLAM. The laser SLAM technology is mature, however, the laser radar is high in cost, high in installation requirement, low in service life, limited in detection range and lack of semantic information; the camera is low in cost, diversified in installation mode, free of sensing detection distance limitation, capable of extracting semantic information, and visual SLAM is a mainstream direction of future research.

In recent years, the visual SLAM technology is widely applied to mobile terminal devices such as robots, automatic driving automobiles, unmanned aerial vehicles and the like to realize functions such as map construction, navigation positioning, path planning and the like. The depth camera is a visual sensor commonly used in mobile terminal equipment applying visual SLAM technology, and can be used for measuring the distance from a lens of the depth camera to a shot object, namely depth data.

Depth cameras can be divided into passive depth cameras (e.g., passive binocular depth cameras) and active depth cameras (e.g., structured light depth cameras, TOF depth cameras). In comparison with the passive binocular vision, the measurement of depth data using TOF is relatively easy to implement, and particularly, an i-TOF depth camera (i.e., a TOF camera based on a phase difference principle) that implements depth data measurement using an indirect TOF (indiect-TOF: i-TOF) principle is widely used due to its advantages in image pixels, application costs, and the like.

The method comprises the steps of collecting RGB images and depth point clouds of a target environment by using an i-TOF depth camera, and then conducting synchronous map construction and positioning by using the RGB images and the depth point clouds to achieve navigation and obstacle avoidance of a robot (such as a sweeping robot). However, in practical application, the visual SLAM scheme based on the i-TOF depth camera has some technical problems.

Firstly, the field angle of an ordinary i-TOF depth camera is small, and the overlapping area between frames is small, so that the pose estimation is not facilitated. Moreover, if the i-TOF depth camera is used as a single pose sensor, it is difficult to adapt to a variety of different scenes because: data acquired from a single sensor is insufficient to realize accurate estimation of the pose, so that the map construction and positioning accuracy is low.

Secondly, the i-TOF depth camera is susceptible to multi-path interference (MPI) during operation, so that the obtained depth point cloud has a large error, resulting in a reduction in the accuracy of map construction and positioning.

Specifically, due to the existence of complex diffuse reflection and specular reflection in the target environment, after the i-TOF depth camera projects detection laser to a target object in the target environment, points on the target object that are different in distance from a lens of the i-TOF depth camera may reflect light to the i-TOF depth camera at the same time, resulting in a deviation between a depth measurement value and an actual value, that is, an error exists in a depth value of each pixel point in the depth point cloud. Moreover, in a scene with a corner, the light returning to the i-TOF depth camera may undergo one reflection or multiple reflections, in other words, the corresponding depth values may not be consistent at the point on the target object where the light is reflected to the i-TOF depth camera at the same time.

Accordingly, an optimized visual SLAM-based patterning and positioning scheme is desired.

Disclosure of Invention

One advantage of the present application is to provide a composition method for a mobile robot based on a visual SLAM, a positioning method for a mobile robot based on a visual SLAM, a composition apparatus for a mobile robot based on a visual SLAM, a positioning apparatus for a mobile robot based on a visual SLAM, and a robot, wherein the composition method improves the accuracy of map construction and positioning by fusing poses and correcting a depth point cloud based on inter-frame pose transformation.

Another advantage of the present application is to provide a composition method for a mobile robot based on a visual SLAM, a positioning method for a mobile robot based on a visual SLAM, a composition device for a mobile robot based on a visual SLAM, a positioning device for a mobile robot based on a visual SLAM, and a robot, wherein the multi-path interference of the collected depth point cloud is subjected to attribute statistics to quantify the condition of the multi-path interference of each pixel point in the depth point cloud.

To achieve at least one of the above advantages or other advantages and objects, according to one aspect of the present application, there is provided a composition method for a mobile robot based on a visual SLAM, including: step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame;

and 2, step: acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference;

and 3, step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises:

step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by inertial sensors and corresponding to the current frame and the previous frame respectively;

step 32: inputting the estimated pose of the current frame and an S frame image frame continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame;

step 33: generating an inter-frame pose transformation matrix based on the pose of the current frame and the pose of the previous frame;

step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame;

step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each of the sequence of keyframes having a corrected depth point cloud; and

and 4, step 4: and generating a target map based on corrected depth point clouds and poses of all key frames in the key frame sequence.

In the composition method for a mobile robot based on a visual SLAM according to the present application, the inertial sensor includes an inertial measurement unit and a wheel encoder, the pose data includes first pose data obtained by the inertial measurement unit and second pose data obtained by the wheel encoder; wherein, step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by inertial sensors corresponding to the current frame and the previous frame respectively, comprising:

obtaining a fusion pose corresponding to the current frame based on a loss function, wherein the loss function is C (x _ k) ═ cost _ imu + cost _ wheel + cost _ camera;

wherein cost _ imu represents an integral loss function of the first pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; cost _ wheel represents an integral loss function of the second pose data corresponding to the current frame and the previous frame respectively, with the estimated pose as an estimation quantity; the cost _ camera is a reprojection error generated based on the feature point of the RGB image of the current frame.

In the composition method for a mobile robot based on a visual SLAM according to the present application, step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame, wherein the method comprises the following steps:

inputting the estimated poses of the current frame and an S frame image frame continuous with the current frame into an optimization function, and calculating a marginalization matrix and a displacement vector of the optimization function to be used as the pose of the current frame;

wherein the optimization function is C (x _ i) ═ Σ (cost _ imu + cost _ wheel + cost _ camera).

In the composition method for a mobile robot based on visual SLAM according to the present application, step 34: based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame, correcting the pixel points of which the attribute values of the multipath interference in the depth point cloud of the current frame are higher than a preset threshold value to obtain the corrected depth point cloud of the current frame, wherein the correction comprises the following steps:

screening out pixel points to be replaced, of which the attribute values of multipath interference in the depth point cloud of the current frame are higher than a preset threshold value;

converting pixel points corresponding to the pixel points to be replaced in the depth point cloud of the previous frame into replacement pixel points through the inter-frame pose transformation matrix; and

and replacing the pixel points to be replaced in the depth point cloud of the current frame by the replacing pixel points to generate the corrected depth point cloud of the current frame.

In the composition method for a mobile robot based on visual SLAM according to the present application, step 4: generating a target map based on the corrected depth point clouds and poses of all keyframes in the sequence of keyframes, including:

generating a probability grid map based on corrected depth point clouds and poses of all key frames in the key frame sequence;

generating a point cloud map based on corrected depth point clouds of all key frames in the key frame sequence;

fusing the point cloud map and the probability grid map to generate the target map.

In the composition method for a mobile robot based on a visual SLAM according to the present application, step 3, further comprises: performing back-end optimization on the key frame sequence and/or the key frame;

wherein, the back-end optimization at least comprises one of the following steps:

constructing a common-view constraint of the feature points based on the feature points of the RGB images of the key frames and a bag-of-words model;

constructing pose constraints among the continuous key frames based on ICP matching; and

when a closed loop is generated, global nonlinear optimization is performed by a beam adjustment method.

In the composition method for a mobile robot based on a visual SLAM according to the present application, the attribute value of the multipath interference is a confidence of the multipath interference.

According to another aspect of the present application, there is provided a positioning method for a mobile robot based on a visual SLAM, including:

step 110: generating the target map based on the composition method for the mobile robot based on the visual SLAM;

step 120: generating a local map by the composition method for the mobile robot based on the visual SLAM based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot;

step 130: determining a repositioning initial pose of the mobile robot based on a non-linear match between the local map and the target map;

step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined by pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and

step 150: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame.

In the positioning method for a mobile robot based on visual SLAM according to the present application, the step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function, wherein the method comprises the following steps:

obtaining an estimated pose of the current frame based on a probability matching loss function, wherein the loss function is C (xp _ k) ═ cost _ imu + cost _ wheel + cost _ camera + cost _ point;

wherein cost _ imu represents an integral loss function of the first pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; cost _ wheel represents an integral function of the second pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; the cost _ camera is a reprojection error of the feature point of the RGB image based on the current frame on the point cloud map; cost _ point is based on a matching error between the corrected depth point cloud for the current frame and the probability grid map.

In the positioning method for a mobile robot based on visual SLAM according to the present application, the step 150: inputting the estimated poses of the current frame and an S frame image frame continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame, wherein the method comprises the following steps: inputting the estimated poses of the current frame and an S frame image frame continuous to the current frame into an optimization function as described below, and calculating a marginalization matrix and a displacement vector of the optimization function to be used as the poses of the current frame;

wherein the optimization function is C (xp _ k) ═ Σ (cost _ imu + cost _ wheel + cost _ camera + cost _ point).

According to another aspect of the present application, there is provided a patterning device for a mobile robot based on a visual SLAM, including:

an initialization unit for performing step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame;

a data acquisition unit for performing step 2: acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference;

a tracking unit for performing step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises:

step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by an inertial sensor and corresponding to the current frame and the previous frame respectively;

step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each keyframe of the sequence of keyframes having a corrected depth point cloud; and

a map generation unit for executing step 4: and generating a target map based on corrected depth point clouds and poses of all key frames in the key frame sequence.

According to another aspect of the present application, there is provided a positioning apparatus for a mobile robot based on a visual SLAM, including:

a map construction unit for performing step 110: generating the target map based on the composition method for the mobile robot based on the visual SLAM;

a local map building unit, configured to perform step 120: generating a local map by the composition method for the mobile robot based on the visual SLAM based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot;

a repositioning initial pose generating unit configured to perform step 130: determining a repositioning initial pose of the mobile robot based on a non-linear match between the local map and the target map;

a pose estimation unit for performing step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined by pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and

and a nonlinear optimization unit, configured to perform step S150, input the estimated pose of the current frame and S frame image frames consecutive to the current frame into a sliding window for nonlinear optimization, so as to generate the pose of the current frame.

According to another aspect of the present application, there is provided a robot including:

a robot main body;

the RGB-D camera module is arranged on the robot main body;

an inertial sensor mounted to the robot body; and

the image processing device is connected with the RGB-D camera module and the inertial sensor processing device in a communication mode, wherein the data processing device is used for executing the composition method for the mobile robot based on the vision SLAM or the positioning method for the mobile robot based on the vision SLAM.

Further objects and advantages of the present application will become apparent from an understanding of the ensuing description and drawings.

These and other objects, features and advantages of the present application will become more fully apparent from the following detailed description, the accompanying drawings and the claims.

Drawings

Fig. 1 illustrates a flowchart of a composition method for a mobile robot based on a visual SLAM according to an embodiment of the present application.

Fig. 2 illustrates a flowchart for generating a sequence of key frames based on SLAM visual tracking in a composition method for a mobile robot based on visual SLAM according to an embodiment of the present application.

Fig. 3 illustrates a flowchart of correcting, in the positioning method for a mobile robot based on a visual SLAM according to an embodiment of the present application, pixel points of which an attribute value of multipath interference is higher than a preset threshold in the depth point cloud of the current frame based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame, so as to obtain a corrected depth point cloud of the current frame.

Fig. 4 illustrates a flowchart of a positioning method for a mobile robot based on a visual SLAM according to an embodiment of the present application.

FIG. 5 illustrates a block diagram of a composition apparatus for a mobile robot based on a visual SLAM according to an embodiment of the present application.

Fig. 6 illustrates a block diagram of a positioning apparatus for a mobile robot based on a visual SLAM according to an embodiment of the present application.

Fig. 7 illustrates a block diagram of a robot in accordance with an embodiment of the present application.

Detailed Description

The following description is presented to disclose the application and to enable any person skilled in the art to practice the application. The embodiments in the following description are intended only as examples, and other obvious variations will occur to those skilled in the art. The underlying principles of the application, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the application.

Summary of the application

As mentioned above, there are some technical problems in the practical application of the visual SLAM scheme based on the i-TOF depth camera. Firstly, the field angle of a common i-TOF depth camera is small, and the inter-frame overlapping area is small, so that pose estimation is not facilitated; and a single sensor is difficult to adapt to various different scenes, and data acquired from the single sensor is insufficient to realize accurate estimation of the pose, so that the map construction and positioning accuracy is low. Secondly, the i-TOF depth camera is susceptible to multi-path interference (MPI), resulting in errors in the depth values of the pixels in the obtained depth point cloud.

For the problem of low map construction and positioning precision caused by low pose estimation precision, on one hand, a wide-angle i-TOF depth camera can be used for expanding the field angle and increasing the inter-frame overlapping area so as to facilitate the matching of corresponding feature points between key frame images and further improve the pose estimation precision. On the other hand, the pose data can be acquired by using a plurality of sensors, and the reliability of the pose data is improved by fusing the pose data from different sensors, so that the map construction and positioning precision is further improved. For example: the pose data from the inertial sensor can be fused, and the pose data is subjected to nonlinear optimization so as to improve the accuracy of map construction and positioning.

The problem of low accuracy of map construction and positioning caused by multipath interference is solved. The inventor of the application tries to start with the processing of depth point cloud data and improve the accuracy of map construction and positioning. Specifically, under the influence of multipath interference, a portion with a large error may exist in the obtained depth point cloud, and then, which pixel points in the depth point cloud have a large error may be identified first, and further, the pixel points with the large error are corrected by using a specific data correction scheme, so as to improve the accuracy of map construction and positioning.

In the technical scheme of the application, attribute statistics can be firstly carried out on the multipath interference of the collected depth point cloud so as to quantify the multipath interference condition of each pixel point in the depth point cloud. Here, the attribute value of the multipath interference reflects the degree of the multipath interference on the depth point cloud. The higher the attribute value of the multipath interference is, the greater the degree of the multipath interference is; conversely, a lower attribute value of multipath interference indicates a lower degree of multipath interference. For example, the attribute value of the multipath interference may be implemented as the confidence of the multipath interference, that is, if the confidence of the multipath is higher, it indicates that the degree of the multipath interference is greater; conversely, the lower the confidence of the multipath interference, the lower the degree of the multipath interference, and correspondingly, the depth point cloud part to be corrected can be identified through the attribute value of the multipath interference.

In the depth point cloud correction process, because data between two adjacent frames of depth point clouds are correlated, in the technical scheme of the application, the inventor tries to correct a depth point cloud part to be corrected in the depth point cloud of the current frame through an inter-frame pose transformation matrix generated by poses of adjacent key frames.

Based on this, the present application proposes a composition method for a mobile robot based on a visual SLAM, which includes: step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame; and 2, step: the method comprises the steps that through an RGB-D camera module, an RGB image of a target environment and a depth image corresponding to the RGB image are obtained, the depth image is converted into a depth point cloud, and each pixel point of the depth point cloud has an attribute value of multipath interference; and 3, step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises: step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by an inertial sensor and corresponding to the current frame and the previous frame respectively; step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame; step 33: generating an inter-frame pose transformation matrix based on the pose of the current frame and the pose of the previous frame; step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the interframe pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame; step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each keyframe of the sequence of keyframes having a corrected depth point cloud; and step 4: and generating a target map based on corrected depth point clouds and poses of all key frames in the key frame sequence. Based on this, the present application also provides an optical lens, which includes the optical system as described above, and a lens barrel for accommodating the optical system therein.

Based on this, the present application also proposes a positioning method for a mobile robot based on a visual SLAM, which includes: step 110: generating the target map based on the composition method for the mobile robot based on the visual SLAM; step 120: generating a local map by the composition method for the mobile robot based on the visual SLAM based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot; step 130: determining a repositioning initial pose of the mobile robot based on a non-linear matching between the local map and the target map; step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and step 150: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame.

Based on this, the present application also proposes a composition apparatus for a mobile robot based on a visual SLAM, which includes: an initialization unit for performing step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame; a data acquisition unit for performing step 2: acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference; a tracking unit for performing step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises: step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by an inertial sensor and corresponding to the current frame and the previous frame respectively; step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame; step 33: generating an inter-frame pose transformation matrix based on the pose of the current frame and the pose of the previous frame; step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame; step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each keyframe of the sequence of keyframes having a corrected depth point cloud; and a map generation unit for executing step 4: and generating a target map based on the corrected depth point clouds and the poses of all key frames in the key frame sequence.

Based on this, the present application also proposes a positioning device for a mobile robot based on a visual SLAM, comprising: a map construction unit for performing step 110: generating the target map based on the composition method for the mobile robot based on the visual SLAM; a local map building unit, configured to perform step 120: based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot, generating a local map by the composition for the mobile robot based on the visual SLAM; a repositioning initial pose generating unit configured to perform step 130: determining a repositioning initial pose of the mobile robot based on a non-linear matching between the local map and the target map; a pose estimation unit for performing step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined by pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and a nonlinear optimization unit for performing step S150 of inputting the estimated poses of the current frame and S frame image frames consecutive to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame.

Based on this, the present application also proposes a robot comprising: a robot main body; the RGB-D camera module is arranged on the robot main body; an inertial sensor mounted to the robot body; and a data processing device communicably connected to the RGB-D camera module and the inertial sensor, wherein the data processing device is configured to execute the visual SLAM-based composition method for the mobile robot or the visual SLAM-based positioning method for the mobile robot.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

The following is a basic description of the pertinent terms referred to in this application.

Visual SLAM: the method is characterized in that a main body carrying a vision sensor, such as a robot, a vehicle, an unmanned aerial vehicle and the like, constructs a target environment map and estimates the pose of the main body to realize autonomous positioning in the process of exploring a target environment under the condition of no prior information.

Key frame: in all frames of the video, the video frames meeting the preset conditions are provided with the information useful for SLAM, and the pose of the main body carrying the vision sensor can be effectively estimated by utilizing the key frames, so that the redundancy of the information is reduced.

Pose: the position can be represented by translation in three directions of x, y and z in a coordinate system, and the posture can be represented by rotation in three directions of x, y and z in the coordinate system.

Point cloud: the method is characterized in that under the same spatial reference system, a set of sampling points of a target object is acquired, spatial distribution and surface characteristics of the target object can be expressed, information such as three-dimensional coordinate information and depth information is provided, and point cloud can be acquired through depth image conversion.

An inertial sensor: the method can be used for measuring acceleration, rotation, multiple degrees of freedom and the like, and can obtain pose data by using an inertial sensor in the application of vision SLAM.

Bag of words model: the method is a feature extraction method, can represent documents as feature vectors, and can utilize a bag-of-words model to perform feature extraction in the application of visual SLAM so as to quickly find out similar images.

ICP: an Iterative Closest Point algorithm (Iterative Closest Point) is a Point cloud matching algorithm, which performs Point cloud transformation based on the pose relationship between two groups of Point clouds, calculates the error of the transformed Closest Point pairs in the two groups of Point clouds, and iterates continuously until the error is less than a threshold value or reaches the maximum iteration times.

Light beam adjustment method: (Bundle Adjustment, BA) refers to a process of extracting optimal 3D model and camera parameters (intrinsic and extrinsic parameters) from visual reconstruction, reflecting several rays (bundles of light rays) from each feature point, and finally converging to the camera optical center after making optimal Adjustment (Adjustment) on the camera pose and the spatial position of the feature point.

Exemplary patterning method

The composition method based on the visual SLAM can be applied to mobile terminal devices such as robots, auto-driven automobiles, unmanned planes, and the like, and for convenience of explanation and understanding, the composition method of the present embodiment is explained below as an example for a mobile robot.

Fig. 1 illustrates a flowchart of a composition method for a mobile robot based on a visual SLAM according to an embodiment of the present application. Fig. 2 illustrates a flowchart for generating a sequence of key frames based on SLAM visual tracking in a composition method for a mobile robot based on visual SLAM according to an embodiment of the present application.

As shown in fig. 1 and 2, the composition method for a mobile robot based on a visual SLAM according to an embodiment of the present application includes: step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame; step 2: acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference; and 3, step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises: step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by an inertial sensor and corresponding to the current frame and the previous frame respectively; step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame; step 33: generating an inter-frame pose transformation matrix based on the pose of the current frame and the pose of the previous frame; step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame; step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each of the sequence of keyframes having a corrected depth point cloud; and step 4: and generating a target map based on corrected depth point clouds and poses of all key frames in the key frame sequence.

In the step 2, an RGB image of a target environment and a depth image corresponding to the RGB image are obtained through an RGB-D camera module, the depth image is converted into a depth point cloud, and each pixel point of the depth point cloud has an attribute value of multipath interference. In the example of the application, the attribute value of the multipath interference is a confidence coefficient of the multipath interference, and reflects the degree of the multipath interference on the depth point cloud. The higher the attribute value of the multipath interference is, the greater the degree of the multipath interference is indicated; conversely, a lower attribute value of the multipath interference indicates a lower degree of the multipath interference.

It is worth mentioning that, in the embodiment of the present application, in order to avoid that the small angle of view affects the pose estimation, preferably, the RGB-D camera module is a wide-angle i-TOF camera module (the horizontal angle of view is not less than 120 °), the angle of view is enlarged, and the overlap area between frames is increased, so as to facilitate matching of corresponding feature points between the keyframe images, and further improve the accuracy of the pose estimation.

Of course, in other examples of this application, can select the suitable module of making a video recording such as the module of making a video recording of other wide angles, the panoramic module of making a video recording, also can select for use a plurality of modules of making a video recording to enlarge the angle of vision. And are not intended to limit the scope of the present application.

As mentioned above, the data obtained from a single sensor is not enough to realize the accurate estimation of the pose, a plurality of sensors can be used to obtain the pose data, and the reliability of the pose data is improved by fusing the pose data from different sensors. In a specific example of the present application, pose data from inertial sensors are fused and nonlinear optimization is performed on the pose data to improve the accuracy of map construction and positioning. The inertial sensor comprises an inertial measurement unit and a wheel type encoder, and the position data comprises first position and posture data obtained by the inertial measurement unit and second position and posture data obtained by the wheel type encoder. Of course, in other examples of the present application, the pose data of other sensors may be fused, for example, a GPS sensor, and this is not a limitation of the present application.

Specifically, step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by inertial sensors corresponding to the current frame and the previous frame respectively, comprising: obtaining a fusion pose corresponding to the current frame based on a loss function, wherein the loss function is C (x _ k) ═ cost _ imu + cost _ wheel + cost _ camera; wherein cost _ imu represents an integral loss function of the first pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; cost _ wheel represents an integral loss function of the second pose data corresponding to the current frame and the previous frame respectively, wherein the estimated pose is used as an estimator; the cost _ camera is a reprojection error generated based on the feature point of the RGB image of the current frame.

Accordingly, in one specific example of the present application, step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame, wherein the method comprises the following steps: inputting the estimated poses of the current frame and an S frame image frame continuous with the current frame into an optimization function, and calculating a marginalization matrix and a displacement vector of the optimization function to be used as the pose of the current frame; wherein the optimization function is C (x _ i) ═ Σ (cost _ imu + cost _ wheel + cost _ camera).

Fig. 3 illustrates a flowchart of correcting pixel points, in the depth point cloud of the current frame, of which the attribute value of multipath interference is higher than a preset threshold value, based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame, so as to obtain a corrected depth point cloud of the current frame, in the positioning method for a mobile robot based on a visual SLAM according to an embodiment of the present application.

As shown in fig. 3, step 34: correcting pixel points of which the attribute values of the multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the interframe pose transformation matrix and the depth point cloud of the previous frame to obtain the corrected depth point cloud of the current frame, wherein the correction process comprises the following steps: screening out pixel points to be replaced, of which the attribute values of multipath interference in the depth point cloud of the current frame are higher than a preset threshold value; converting pixel points corresponding to the pixel points to be replaced in the depth point cloud of the previous frame into replacement pixel points through the inter-frame pose transformation matrix; and replacing the pixel points to be replaced in the depth point cloud of the current frame by the replacement pixel points to generate the corrected depth point cloud of the current frame.

It is worth mentioning that in order to obtain a key frame with high point cloud precision, an initial key frame with more than 90% of pixel points with the attribute value of multipath interference lower than a preset threshold value in the depth point cloud can be screened out to serve as a preselected frame, a preselected frame pose transformation matrix is generated based on the pose of a preselected current frame and the pose of a preselected previous frame, and the pixel points with the attribute value of multipath interference higher than the preset threshold value in the depth point cloud of the preselected current frame are corrected based on the preselected frame pose transformation matrix and the depth point cloud of the preselected previous frame, so that the corrected depth point cloud of the preselected current frame is obtained.

Particularly, the pixel points with the attribute value of multipath interference higher than a preset threshold value in the depth point cloud of the current frame can be corrected through the interframe pose transformation matrix based on the interframe pose transformation matrix and the depth point cloud of the previous frame by utilizing a pixel point replacement mode. For example, in other examples of the present application, a weighted manner or other manners may be used to correct, through the inter-frame pose transformation matrix and the depth point cloud of the previous frame, a pixel point in the depth point cloud of the current frame, for which an attribute value of multipath interference is higher than a preset threshold, which is not limited by the present application.

Through the correction of the depth point cloud, the error of the depth point cloud caused by multipath interference can be reduced, and the map construction and positioning precision is improved.

In step 3: in the process of generating a sequence of key frames based on SLAM visual tracking, errors inevitably exist in the obtained data for completing map construction and positioning, and particularly, as the range of the target map is enlarged, the accuracy of the data is more and more influenced by the accumulated errors. After the key frame sequence is generated according to an actual scene and actual requirements, back-end optimization can be performed on the key frame sequence and/or the key frames.

Accordingly, the composition method for a mobile robot based on the visual SLAM further includes: performing back-end optimization on the key frame sequence and/or the key frame; wherein, the back-end optimization at least comprises one of the following steps: constructing a common-view constraint of the feature points based on the feature points of the RGB images of the key frames and a bag-of-words model; constructing pose constraints among the continuous key frames based on ICP matching; and when the closed loop is generated, carrying out global nonlinear optimization by using a beam adjustment method.

In a specific example of the present application, step 4: generating a target map based on the corrected depth point clouds and poses of all keyframes in the sequence of keyframes, including: generating a probability grid map based on corrected depth point clouds and poses of all key frames in the key frame sequence; generating a point cloud map based on corrected depth point clouds of all key frames in the key frame sequence; fusing the point cloud map and the probability grid map to generate the target map.

Exemplary positioning method

According to another aspect of the present application, there is also provided a positioning method for a mobile robot based on a visual SLAM. Fig. 4 illustrates a flowchart of a positioning method for a mobile robot based on a visual SLAM according to an embodiment of the present application.

As shown in fig. 4, the positioning method for a mobile robot based on a visual SLAM according to an embodiment of the present application includes: step 110: generating the target map based on the composition method for a mobile robot based on the visual SLAM as described; step 120: based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot, generating a local map by the composition method for the mobile robot based on the visual SLAM; step 130: determining a repositioning initial pose of the mobile robot based on a non-linear matching between the local map and the target map; step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and step 150: and inputting the estimated positions of the current frame and an S frame image frame continuous to the current frame into a sliding window for nonlinear optimization to generate the position of the current frame.

Specifically, the step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function, wherein the method comprises the following steps: obtaining an estimated pose of the current frame based on a probability matching loss function, wherein the loss function is C (xp _ k) ═ cost _ imu + cost _ wheel + cost _ camera + cost _ point; wherein C (xp _ k) represents the estimated pose; cost _ imu represents an integral loss function of the first pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; cost _ wheel represents an integral function of the second pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; the cost _ camera is a reprojection error of the feature points of the RGB image based on the current frame on the point cloud map; cost _ point is based on a matching error between the corrected depth point cloud for the current frame and the probability grid map.

Accordingly, the step 150: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame, wherein the method comprises the following steps: inputting the estimated poses of the current frame and an S frame image frame continuous with the current frame into an optimization function, and calculating a marginalization matrix and a displacement vector of the optimization function to be used as the pose of the current frame; the optimization function is C (xp _ k) ═ Σ (cost _ imu + cost _ wheel + cost _ camera + cost _ point).

Exemplary patterning device

According to another aspect of the present application, there is also provided a patterning device 10 for a mobile robot based on visual SLAM. FIG. 5 illustrates a block diagram of a composition apparatus for a mobile robot based on a visual SLAM according to an embodiment of the present application.

As shown in fig. 5, the composition apparatus 10 for a mobile robot based on a visual SLAM according to an embodiment of the present application includes: an initialization unit 11, configured to perform step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame; a data obtaining unit 12, configured to perform step 2: acquiring an RGB image of a target environment and a depth image corresponding to the RGB image through an RGB-D camera module, and converting the depth image into a depth point cloud, wherein each pixel point of the depth point cloud has an attribute value of multipath interference; a tracking unit 13, configured to perform step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises: step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by an inertial sensor and corresponding to the current frame and the previous frame respectively; step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame; step 33: generating an inter-frame pose transformation matrix based on the pose of the current frame and the pose of the previous frame; step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the interframe pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame; step 35: iteratively performing steps 31-34 to obtain the sequence of keyframes, each keyframe of the sequence of keyframes having a corrected depth point cloud; and a map generation unit 14 for executing step 4: and generating a target map based on corrected depth point clouds and poses of all key frames in the key frame sequence.

Exemplary positioning device

According to another aspect of the present application, there is also provided a positioning apparatus 20 for a mobile robot based on a visual SLAM. Fig. 6 illustrates a block diagram of a positioning device for a mobile robot based on a visual SLAM according to an embodiment of the present application.

As shown in fig. 6, the positioning apparatus 20 for a mobile robot based on a visual SLAM according to an embodiment of the present application includes: a map construction unit 21, configured to perform step 110: generating the target map based on the composition method for the mobile robot based on the visual SLAM; a local map building unit 22, configured to perform step 120: generating a local map by the composition method for the mobile robot based on the visual SLAM based on the local RGB image and the depth point cloud of the target environment acquired by the mobile robot; a repositioning initial pose generating unit 23 configured to perform step 130: determining a repositioning initial pose of the mobile robot based on a non-linear match between the local map and the target map; a pose estimation unit 24 for performing step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and a nonlinear optimization unit 25, configured to perform step S150, input the estimated pose of the current frame and S frame image frames consecutive to the current frame into a sliding window for nonlinear optimization, so as to generate the pose of the current frame.

Exemplary robot

According to another aspect of the present application, a robot 80 is also provided. Fig. 7 illustrates a block diagram of a robot in accordance with an embodiment of the present application. As shown in fig. 7, the robot 80 according to the embodiment of the present application includes: a robot main body 81; an RGB-D camera module 82 mounted on the robot main body; an inertial sensor 83 attached to the robot main body; and a data processing device 84 communicably connected to the RGB-D camera module and the inertial sensor, wherein the data processing device is configured to execute the visual SLAM-based composition method for the mobile robot or the visual SLAM-based positioning method for the mobile robot.

Claims

1. A composition method for a mobile robot based on a visual SLAM, comprising:

step 1: initializing to generate an initial key frame and an initial pose corresponding to the initial key frame;

step 2: the method comprises the steps that through an RGB-D camera module, an RGB image of a target environment and a depth image corresponding to the RGB image are obtained, the depth image is converted into a depth point cloud, and each pixel point of the depth point cloud has an attribute value of multipath interference;

and step 3: generating a sequence of key frames based on SLAM visual tracking, wherein step 3 comprises:

step 34: correcting pixel points of which the attribute values of multipath interference are higher than a preset threshold value in the depth point cloud of the current frame based on the interframe pose transformation matrix and the depth point cloud of the previous frame to obtain corrected depth point cloud of the current frame;

and 4, step 4: and generating a target map based on the corrected depth point clouds and the poses of all key frames in the key frame sequence.

2. The vision SLAM-based composition method for a mobile robot of claim 1, wherein the inertial sensors comprise an inertial measurement unit and a wheel encoder, the pose data comprising first pose data obtained by the inertial measurement unit and second pose data obtained by the wheel encoder;

wherein, step 31: obtaining an estimated pose corresponding to a current frame based on feature points of an RGB image of the current frame and feature points of an RGB image of a previous frame adjacent to the current frame and pose data provided by inertial sensors corresponding to the current frame and the previous frame respectively, comprising:

wherein cost _ imu represents an integral loss function of the first pose data corresponding to the current frame and the previous frame respectively with the estimated pose as an estimator; cost _ wheel represents an integral loss function of the second pose data corresponding to the current frame and the previous frame respectively, wherein the estimated pose is used as an estimator; the cost _ camera is a reprojection error generated based on the feature point of the RGB image of the current frame.

3. The vision SLAM-based composition method for mobile robots as claimed in claim 2, wherein step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame, wherein the method comprises the following steps:

inputting the estimated poses of the current frame and an S frame image frame continuous to the current frame into an optimization function as described below, and calculating a marginalization matrix and a displacement vector of the optimization function to be used as the poses of the current frame;

4. The vision SLAM-based composition method for mobile robots as claimed in claim 3, wherein step 34: based on the inter-frame pose transformation matrix and the depth point cloud of the previous frame, correcting the pixel points of which the attribute values of the multipath interference in the depth point cloud of the current frame are higher than a preset threshold value to obtain the corrected depth point cloud of the current frame, wherein the correction comprises the following steps:

screening out pixel points to be replaced, of which the attribute values of the multipath interference in the depth point cloud of the current frame are higher than a preset threshold value;

5. The vision SLAM-based composition method for mobile robots as claimed in claim 4, wherein step 4: generating a target map based on the corrected depth point clouds and poses of all keyframes in the sequence of keyframes, including:

and fusing the point cloud map and the probability grid map to generate the target map.

6. The composition method for a mobile robot based on a visual SLAM according to any one of claims 1 to 5, wherein the step 3, further comprises: performing back-end optimization on the key frame sequence and/or the key frame;

when closed loop is generated, global nonlinear optimization is performed by a beam adjustment method.

7. The visual SLAM-based composition method for mobile robots as claimed in any one of claims 1 to 5 wherein the attribute value of the multi-path interference is a confidence of the multi-path interference.

8. A positioning method for a mobile robot based on visual SLAM, comprising:

step 110: generating the target map based on the composition method for a mobile robot based on the visual SLAM of any one of claims 1 to 7;

step 120: generating a local map based on the local RGB image and depth point cloud of the target environment acquired by the mobile robot by the composition method for the mobile robot based on the visual SLAM according to any one of claims 1 to 7;

step 130: determining a repositioning initial pose of the mobile robot based on a non-linear matching between the local map and the target map;

9. A patterning device for a mobile robot based on a visual SLAM, comprising:

a data acquisition unit for performing step 2: the method comprises the steps that through an RGB-D camera module, an RGB image of a target environment and a depth image corresponding to the RGB image are obtained, the depth image is converted into a depth point cloud, and each pixel point of the depth point cloud has an attribute value of multipath interference;

step 32: inputting the estimated poses of the current frame and S frame image frames continuous to the current frame into a sliding window for nonlinear optimization to generate the pose of the current frame;

10. A positioning device for a mobile robot based on a visual SLAM, comprising:

a map construction unit for performing step 110: generating the target map based on the composition method for a mobile robot based on the visual SLAM of any one of claims 1 to 7;

a local map building unit, configured to perform step 120: generating a local map based on the local RGB image and depth point cloud of the target environment acquired by the mobile robot by the composition method for the mobile robot based on the visual SLAM according to any one of claims 1 to 7;

a repositioning initial pose generating unit configured to perform step 130: determining a repositioning initial pose of the mobile robot based on a non-linear matching between the local map and the target map;

a pose estimation unit for performing step 140: constructing a nonlinear optimization function based on a reprojection error of feature points of an RGB image of a current frame on the point cloud map, a matching error between the corrected depth point cloud of the current frame and the probability grid map, and an inertial pose error determined based on pose data provided by the inertial sensor and respectively corresponding to the current frame and the previous frame, and obtaining an estimated pose of the current frame through the nonlinear optimization function; and

11. A robot, comprising:

a robot main body;

the RGB-D camera module is arranged on the robot main body;

an inertial sensor mounted to the robot body; and

a data processing device communicably connected to the RGB-D camera module and the inertial sensor, wherein the data processing device is configured to execute the visual SLAM-based composition method for a mobile robot according to any one of claims 1 to 7, or the visual SLAM-based positioning method for a mobile robot according to claim 8.