CN107861508B

CN107861508B - Local motion planning method and device for mobile robot

Info

Publication number: CN107861508B
Application number: CN201710987041.8A
Authority: CN
Inventors: 刘越江; 陈子冲
Original assignee: Ninebot Beijing Technology Co Ltd
Current assignee: Ninebot Beijing Technology Co Ltd
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2021-04-20
Anticipated expiration: 2037-10-20
Also published as: CN107861508A; WO2019076044A1

Abstract

The invention discloses a local motion planning method for a mobile robot, which comprises the following steps: determining a planar (2d) local cost map image; determining a velocity of the mobile robot; formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot. The invention also discloses a local motion planning device of the mobile robot.

Description

Local motion planning method and device for mobile robot

Technical Field

The invention relates to the technical field of robots, in particular to a method and a device for planning local motion of a mobile robot.

Background

The obstacle avoidance motion planning is one of the basic skills that an intelligent mobile robot is expected to master. Various algorithms have been developed over the last decade to enable robots to plan to target points or follow reference paths without hitting the trajectory of an obstacle. Despite significant advances, mobile robots are far from human mobility planning. For example, humans make motion decisions quickly with negligible effort, adapt robustly to uncertainty and unforeseen obstacles, and behave very smoothly and naturally. Given sufficient localized and Global path information, such as Global Positioning System (GPS) and Google (Google) maps, people navigate under different conditions with powerful planning decision-making capabilities, whether during walking, running or driving. In contrast, mobile robots are also striving to achieve these.

One class of high performance planning algorithms addresses the local movement planning and obstacle avoidance issues from an optimization perspective. The main feature of these algorithms is to minimize the cost function (or maximize the utility function) of the underlying trajectory. The optimal trajectory is then repeatedly fed to the robot controller at each cycle together with the corresponding control strategy. This method is mathematically interpretable. However, solving the optimization problem is a challenge: since the objective function involves a robot dynamic model, and constraints may consist of components related to complex geometry, the optimization problem is typically non-linear and difficult to solve in real-time on consumer robots with limited computational resources. In order to reduce the computation time, researchers have proposed various methods, such as sampling and linearization methods, which however all lead to degradation into sub-optimal solutions.

In addition to optimization-based approaches, emulation Learning (IL) is becoming an emerging technology that addresses planning and control issues. The core idea of mock learning is to learn strategies from expert demonstrations by cloning expected behaviors or understanding the rewards pursued by the demonstrator. One fast growing branch in mock learning is the end-to-end approach, mapping raw sensory inputs directly to motor outputs without manual programming. End-to-end model learning systems have been developed for a wide range of applications and with exciting results through Convolutional Neural Networks (CNN). However, the performance of most existing CNN-based planners or controllers is significantly lower than that of policy demonstrations, in part because standard CNNs are not sufficient to express planning-based reasoning. Another challenge of mock learning is the mismatch in state distributions between the demonstrator and the learning strategy, which can result in trapping in suboptimal, even divergent trajectories when using the learning strategy. Some approaches attempt to solve this problem by iteratively collecting training samples that are closer to the learned strategy. However, scarcity of catastrophic events in the training dataset remains a real operational risk.

Therefore, how to make the mobile robot efficiently avoid obstacles remains a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, the present invention is intended to provide a method and an apparatus for planning local motion of a mobile robot, and provide a new simulation learning method for planning local motion of a mobile robot and avoiding an obstacle, which can efficiently avoid an obstacle, accelerate a decision of planning local motion of a mobile robot, and simultaneously optimize, secure and generally make the decision as much as possible.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for planning local motion of a mobile robot, where the method includes:

determining a planar (2d) local cost map (Costmap) image;

determining a velocity of the mobile robot;

formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot.

In the foregoing solution, optionally, the determining a 2d local cost map image includes:

acquiring data acquired by a preset sensor on the mobile robot;

positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;

determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;

and determining a 2d local cost map image according to the local target point and the local obstacle map.

In the foregoing solution, optionally, the learning-based planner learns the local motion planning strategy by:

given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):

u＝f_θ(m,u') (1)

where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.

In the foregoing solution, optionally, the determining the 2d local cost map includes:

determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:

m＝λm_goal-m_obs (2)

wherein m is_obsLocal obstacle map being the probability of an obstacle, m_goalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is set as 1, the values of other pixels are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective sub-target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.

inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.

In the foregoing solution, optionally, the determining the model weight parameter θ includes:

given a set of exemplary action instructions

The planner trains the required local motion planning strategy based on the error minimization criterion:

wherein the content of the first and second substances,

is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction

And the actual action command f_θ(m_i,u'_i) A cumulative sum of squares of absolute values of the differences;

once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.

In the foregoing solution, optionally, the formulating, by the learning-based planner, an action instruction for the mobile robot includes:

inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;

wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. of_k+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. of_k(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;

determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);

and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.

In the foregoing scheme, optionally, the method further includes:

selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;

the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

where N is the length of the prediction horizon, x_kIs the 2d pose, d, of the mobile robot at time step k_gIs the distance, alpha, between the mobile robot and the local target point at a time step N_gIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)_kU) is a kinematic model of the robot,

is the maximum probability of obstruction, w, of access being allowed₁，w₂，w₃Is a cost weight parameter.

In the foregoing solution, optionally, the acquisition source of the training samples of the learning-based planner includes:

generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;

a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.

In a second aspect, an embodiment of the present invention provides a local motion planning apparatus for a mobile robot, where the apparatus includes:

the preprocessor is used for determining a plane 2d local cost map image; determining a velocity of the mobile robot;

a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.

In the foregoing scheme, optionally, the preprocessor is specifically configured to:

acquiring data acquired by a preset sensor on the mobile robot;

In the foregoing scheme, optionally, the controller is specifically configured to:

u＝f_θ(m,u') (1)

m＝λm_goal-m_obs (2)

wherein m is_obsLocal obstacle map being the probability of an obstacle, m_goalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, using the projection point of the effective target point to the map borderReplacing the valid target point.

In the foregoing solution, optionally, the manner for determining the model weight parameter θ by the controller includes:

given a set of exemplary action instructions

wherein the content of the first and second substances,

inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;

In the foregoing solution, optionally, the controller is further configured to:

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

where N is the length of the prediction horizon, x_kIs the mobile at time step k2d pose of the robot, d_gIs the distance, alpha, between the mobile robot and the local target point at a time step N_gIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)_kU) is a kinematic model of the robot,

In the foregoing solution, optionally, the controller is further configured to:

a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;

and using the first cost map and the second cost map as training samples of the learning-based planner.

In a third aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored in the computer storage medium, where the computer program is used to execute the local motion planning method for a mobile robot described above.

The invention provides a local motion planning method and a local motion planning device for a mobile robot, which are used for determining a 2d local cost map image; determining a velocity of the mobile robot; based on the speed and the 2d local cost map image, an action instruction is made for the mobile robot through a learning-based planner, so that the mobile robot executes the action instruction, obstacles can be efficiently avoided, a local motion planning decision of the mobile robot is accelerated, and the made decision is optimized, safe and universal as much as possible.

Drawings

Fig. 1 is a schematic flow chart of an implementation of a local motion planning method for a mobile robot according to the present invention;

fig. 2 is a schematic structural diagram of a local motion planning apparatus for a mobile robot according to the present invention;

FIG. 3 is a block diagram of a local mobility planning system with a policy network according to the present invention;

FIG. 4 is a schematic diagram of a locally planned deep neural network provided by the present invention;

FIG. 5 is a schematic diagram of a Sagnerwi delivery robot provided in the present invention;

FIG. 6 is a block diagram of the absolute prediction error of a trained planner versus an exemplar provided by the present invention;

FIG. 7 is a diagram of the action instructions provided by the trained planner and demonstration machine of the present invention for three cases within different error ranges of a test data set;

FIG. 8 is a schematic diagram illustrating a comparison of the optimal gap between a learning-based planner and an optimization-based planner provided by the present invention;

FIG. 9 is a schematic diagram illustrating a comparison of computation time between a learning-based planner and an optimization-based planner provided by the present invention;

FIG. 10 is a navigation diagram in a simulation environment provided by the present invention;

FIG. 11 is a schematic diagram of the response of the trained planner provided by the present invention to an unexpected obstacle on a reference path in a real-world experiment;

FIG. 12 is a schematic diagram of a navigation trajectory from a real-world environment provided by the present invention.

Detailed Description

In order to better explain the present invention, some prior art studies on simulation learning are described below.

Currently, research on mimic learning is widely focused on end-to-end approaches. For example, the convolutional neural network model is trained to map visual input images to left/right steering commands for grounded mobile robots, quad-rotor aircraft, and autopilots. Another recent study has proposed a target driven movement planner that generates linear and angular velocities from the raw input of the laser rangefinder. Despite these advances, the effectiveness of network models in the planning context remains a challenge in existing policy emulation approaches.

To improve planning-based reasoning, new neural network architectures have recently been investigated in several works. An end-to-end architecture called predictor (predictron) is introduced, whose core is an abstract model represented by the markov reward process, which can deduce a value estimate. Another deep neural network architecture with similar potential motivations is a value iterative network that includes a special module for recursive value iteration. The learner developed a cyclic network as a representation of path integral optimal control to learn cost and dynamic models.

Another practical challenge of mock learning is the mismatch in data distribution between the exemplary and learned strategies. To address this challenge, researchers have proposed an iterative data aggression method and applied it to learning the reactive controllers of micro-aircraft. This approach is further extended to SafeDAgger, which trains a safe strategy to prevent the learning process from falling into a dangerous state, while reducing the frequency of human intervention. Another recent article addresses this problem by gradually filling the allocation gap between demonstration and learned strategies using an adaptive model predictive controller as a demonstrator for adjusting its strategy.

However, the scarcity of catastrophic events in the training data set of these approaches remains a drawback in the safety-to-top application.

Based on the above, the invention provides a simulation learning method for local motion planning and obstacle avoidance of a mobile robot. The main goal is to speed up the local motion planning decisions made by the mobile robot, while making the decisions as optimal, safe and versatile as possible.

The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.

The embodiment of the invention provides a local motion planning method for a mobile robot, which mainly comprises the following steps of:

and step 101, determining a plane (2d) local cost map image.

Optionally, the determining a 2d local cost map image includes:

acquiring data acquired by a preset sensor on the mobile robot;

Here, the predetermined sensor may be one or a plurality of sensors. For example, the predetermined sensor may be a camera or a video camera on the mobile robot.

And 102, determining the speed of the mobile robot.

As one embodiment, the determining the velocity of the mobile robot includes:

determining first position information of the mobile robot at a first moment;

determining second position information of the mobile robot at a second moment; wherein the first time is a time before the second time;

and determining the speed of the mobile robot according to the first position information, the second position information, the first time and the second time.

For example, the speed of the mobile robot is (second position information — first position information)/(second time-first time).

Of course, the manner of determining the speed of the mobile robot is not limited to the above-listed form, and may be determined by other means. For example, data is directly obtained from a speed sensor of the mobile robot, and the speed of the mobile robot is determined.

And 103, formulating action instructions for the mobile robot through a learning-based planner based on the speed and the 2d local cost map image so as to execute the action instructions by the mobile robot.

Here, the action instruction includes:

the linear velocity and the angular velocity that the mobile robot needs to execute.

In this way, the mobile robot can effectively avoid obstacles by executing the motion command determined in the above manner, and the time required for the motion command determined in the above manner is short.

The learning-based planner is a trained planner and can rapidly and accurately make action instructions for the mobile robot.

Illustratively, the learning-based planner learns a local motion planning strategy by:

u＝f_θ(m,u') (1)

where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.

In the above scheme, one of the determination manners of the 2d local cost map is as follows:

determining a cost map according to equation (2), wherein the cost map is constructed as a sum of local target rewards and obstacle penalties:

m＝λm_goal-m_obs (2)

wherein m is_obsLocal obstacle map being the probability of an obstacle, m_goalIs a binary target map in which the pixel value of a local target point is set to 1, the other pixel values are set to 0, and λ is a prize-related valueA hyperparameter of the excitation coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.

In the above scheme, the second determination method of the 2d local cost map is as follows:

In the above scheme, the determining the model weight parameter θ includes:

given a set of exemplary action instructions

wherein the content of the first and second substances,

In the above solution, the formulating an action instruction for the mobile robot by the learning-based planner includes:

and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot, so that the mobile robot can simulate expected behaviors and avoid obstacles in local motion planning.

Here, the attention area is a partial area on the cost map.

Here, the feature extracted from the region of interest, for example, only the feature that coincides with the direction in which the target point is oriented, contributes to quick acquisition of the motion instruction.

In the above scheme, the method further comprises:

selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator; here, the preset number may be understood as a large number;

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

Illustratively, the acquisition sources of training samples for the learning-based planner include:

Here, the probability range of the predetermined ratio may be understood as: most of the time the learning based planner will encounter such a trajectory.

Optionally, the method for generating the second cost map includes:

firstly, randomly generating a preset number of binary barrier clusters;

and then, carrying out Gaussian fuzzy processing on the binary obstacle clusters, and carrying out probability conversion on the obstacle map.

And finally, randomly drawing the local target points on the obstacle map.

The method for planning the local motion of the mobile robot can efficiently avoid obstacles, quicken the speed of the mobile robot for making a local motion planning decision, and simultaneously make the decision as optimized, safe and universal as possible.

Example two

The embodiment provides a local motion planning device for a mobile robot, which comprises:

a preprocessor 10 for determining a planar 2d local cost map image; determining a velocity of the mobile robot;

a controller 20 for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.

In the above solution, the mobile robot includes a learning-based planner.

As an embodiment, the preprocessor 10 is specifically configured to:

acquiring data acquired by a preset sensor on the mobile robot;

Specifically, how the learning-based planner learns the local motion planning strategy, how to select the demonstrator, how to obtain the training sample, and other processing manners can be realized by referring to corresponding contents in the mobile robot local motion planning method; and will not be described in detail herein.

As an embodiment, the controller 20 is specifically configured to:

u＝f_θ(m,u') (1)

As an embodiment, the preprocessor 10 is specifically configured to:

m＝λm_goal-m_obs (2)

wherein m is_obsLocal obstacle map being the probability of an obstacle, m_goalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.

As an embodiment, the preprocessor 10 is specifically configured to:

As an embodiment, the manner in which the controller 20 determines the model weight parameter θ includes:

given a set of exemplary action instructions

wherein the content of the first and second substances,

As an embodiment, the controller 20 is specifically configured to:

wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. of_k+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. of_k(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the local cost of the mobile robot at 2d after executing the motion command u at the position sThe location of a pixel in the map;

As an embodiment, the controller 20 is further configured to:

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

As an embodiment, the controller 20 is further configured to:

Those skilled in the art should understand that the functions implemented by each processing module in the mobile robot local motion planning apparatus shown in fig. 2 can be understood by referring to the related description of the mobile robot local motion planning method. Those skilled in the art will appreciate that the functions of each processing unit in the local motion planning apparatus for a mobile robot shown in fig. 2 can be implemented by a program running on a processor, and can also be implemented by a specific logic circuit.

The local motion planning device for the mobile robot can enable the mobile robot to efficiently avoid obstacles, quickly make local motion planning decisions, and simultaneously enable the made decisions to be optimized, safe and universal as much as possible.

EXAMPLE III

Based on the method and the device for planning the local motion of the mobile robot in the first embodiment and the second embodiment, a method for planning the local motion of the mobile robot and avoiding obstacles, which is proposed by deep simulation learning, is given below. The main goal is to speed up the mobile robot making local motion planning decisions, while making the decisions as optimal, safe and versatile as possible.

A. System architecture

Fig. 3 is a block diagram of a local movement planning system with a policy network, and as can be seen from fig. 3, the system mainly includes two large planning blocks, the first planning block is used for preprocessing raw sensing data and generating a local occupancy map describing surrounding obstacles and local target points extracted from a global path according to the posture of the robot. These intermediate results are then fed to a second planning block where we use a deep neural network to simulate a local planning strategy. In addition, we also provide the speed of the robot as a network input to improve the smoothness of the sequential decision. During deployment, the proposed neural network strategy generates action commands by performing feed-forward calculations at each sample time, and is therefore computationally efficient and easy to process for real-time decision-making.

B. Problem formulation and how to learn local motion planning strategies

Given the 2d local cost map image and the mobile robot speed, the local motion planning strategy provides the following action commands:

u＝f_θ(m,u') (1)

where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is the latest velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.

Specifically, the cost map is constructed as the sum of local target reward and obstacle penalty:

m＝λm_goal-m_obs (2)

wherein m is_obsLocal obstacle map being the probability of an obstacle, m_goalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is equal to 1, the others are 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.

It should be noted that the cost map can also be determined by other ways:

inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.

Note that the cost map can also be learned within the neural network by adding an additional convolutional layer in front of the network. However, in our empirical experiments we observed that the inference precision differed very little. Two possible reasons may be: the cost map calculated from equation (2) does represent the essence of the cost function of the demonstrator, and the cost function is finally learned and adapted to the cost map. In this work, we removed the convolutional layer of cost learning, aiming to reduce model redundancy.

Given a set of exemplary action instructions

wherein the content of the first and second substances,

C. Neural network model, representing f by using the neural network model_θ(m_i,u'_i)

wherein s and r correspond to the position and cost of the pixel in the cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor, and the value of gamma is 0-1; v. of_k+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. of_k(s') represents the velocity of the mobile robot at the kth cycle; representing the value of the mobile robot at the position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;

determining a high-level planning feature from the region of interest, such as a Q value in fig. 4, from the high-level planning features extracted by formula (4);

advanced planning features from the area of interest are fused with the speed of the mobile robot and the fused results are provided to a fully connected layer that can generate smooth continuous motion commands to mimic expected behavior and avoid obstacles in local motion planning.

Here, the region of interest is a partial region on the 2d local cost map.

Fig. 4 shows a schematic diagram of a locally planned deep neural network, in fig. 4, the parameters in the convolutional layer represent filter size, depth and stride. The parameters in the fully connected layers indicate the size of the output unit. In the experiment, the loop number K can be selected as 36, and how to select the loop number K is related to the size of the local cost map image.

D. Demonstrator based on optimization

Since the main goal of the proposed method is to learn the optimal strategy, the natural choice of the demonstrator is an optimization-based planner provided with a large amount of computational resources. In the present invention, we use a set of developed local planners as demonstrator, which tries to minimize the cost at each sampling time, specifically, the demonstrator determines the motion command for the mobile robot according to the formulas (5a), (5b), (5c), wherein the formulas (5b) and (5c) are constraint conditions;

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

is the maximum probability of obstruction, w, of access being allowed₁，w₂，w₃Is a cost weight parameter. To simplify the calculation, the control action is assumed to be unchanged within the prediction range.

During the robot navigation, the solution according to equations (5a), (5b), (5c) needs to be repeated at each sampling time. However, the term m_obs(x_k) And g (x)_kU) may involve non-linearities that make the problem difficult to solve effectively. Conventional approaches use a sampling-based solver, which often compromises an approximate sub-optimal solution. At each planning cycle, some potential trajectories within the prediction horizon are first generated and then evaluated according to the associated cost values. The quality of the resulting solution is therefore of great rangeDepending on the number and differences of candidate trajectories considered. For behavioral demonstration purposes, we employ a large sample planner to provide a near-optimal reference strategy.

E. Data acquisition

As previously mentioned, an open challenge of mock learning is a state assignment mismatch between the training data set and the learned strategy. To effectively solve this problem and to cover as completely as possible the observations that may occur, we collected training samples from two sources. The first is a normal experimental trajectory demonstrated by an expert, which is expected to be encountered most of the time by a trained planner. The second is a random cost map generated artificially to feed dangerous situations that the demonstrator rarely encounters. The random cost map generation process is as follows: some binary obstacle clusters are first randomly generated and then gaussian blurred for probability transformation of the obstacle map. Subsequently, the local target points are randomly drawn on the map. Finally, we convert the cost map into a robot coordination system so that the poses of the robots in all data sets are the same, which is considered to be beneficial to sample efficiency.

The invention develops a simulated learning algorithm, realizes the real-time approximate optimal local motion planning, and simultaneously keeps better safety and universality for the application of the mobile robot. Unlike end-to-end modeling, we developed a local planning strategy based on a preprocessed 2d local cost map as input. The local cost map may be constructed from local target points and obstacle maps containing multiple frames of information received from sensor devices such as lidar, sonar and depth cameras. At each sample time, our local motion planning model embedded in the value iterative network produces an action instruction through feed forward reasoning, which is computationally efficient and enables planning-based reasoning. To train a robust model, we use a combination of real-world local obstacle maps and randomly generated artificial maps collected from demonstration experiments, which not only speeds up the data collection process, but also complements dangerous observation samples that are rarely encountered in demonstration. A brief comparison of the methods presented herein and existing learning methods is summarized in table 1.

Table 1: comparison of local movement planning methods between local 2d mimic learning, existing end-to-end mimic learning, and traditional optimization-based methods.

The main contributions of the present application include at least:

1. the local motion planning is expressed as a simulated learning problem based on the pre-processed 2d local cost map image. This form of learning may incorporate Value Iteration Networks (VINs) and is common to a wide variety of sensor devices.

2. By randomly generating an artificial local cost map, the problem of a shortage of training data, particularly a scarce but dangerous sample of events, is overcome.

In order to better illustrate the effectiveness and feasibility of the local motion planning method for the mobile robot, the local motion planning method for the mobile robot can be applied to a robot platform for experiments.

In the following, the experiments and evaluation of the local planner are described in detail.

A. Equipment

First, the robot platform used in the experiment was a delivery robot (Loomo Go), a Segway delivery robot (Segway delivery robot) equipped with an Intel real sensor (Intel real sensor), an ultrasonic sensor, and a wheel encoder. The depth camera in RealSense is used to maintain a fixed size 2.8m of the partial occupancy map with a resolution of 0.1 m.

Second, for model training, we have collected over 600k tuples in total, half from the exemplary trajectory and half from random generation. The data set was divided into a training set (80%) and a test set (20%). Here, the ratio of the training set to the test set may be adjusted according to actual requirements. The exemplary uses an optimization-based planner that samples 11 grid points for linear velocities in the range of 0.0m/s, 0.5m/s and 81 grid points for angular velocities in the range of-0.8 rad/s, 0.8 rad/s. The neural network model was implemented using a deep learning system (Tensorflow) framework and trained ab initio on an Ingland-Tatan (Nvidia Titan X) for about 8 hours with an Adam optimizer; wherein Adam is an algorithm for optimizing a random objective function based on a first order gradient; evaluation and deployment were performed on a laptop (laptop) using an Intel i7-6700HQ CPU, Ubuntu 14.04 operating system. FIG. 5 shows a schematic diagram of a SageWis delivery robot in FIG. 5 equipped with Intel Realsense ZR300(30Hz RGB-depth, FishEye and IMU), Intel Atom Z8750(4 cores, 2.4GHz) and 4GB memory; wherein, IMU's English is called completely Inertial measurement unit, and Chinese meaning is Inertial measurement unit.

B. Model index

We first evaluate the trained model frame by comparing the prediction accuracy in the training dataset and the test dataset and the performance of the proposed learning-based planner against the optimization-based planner.

1) Training and testing accuracy: FIG. 6 shows a block diagram of the absolute prediction error of a trained planner versus an exemplar. The errors of the linear and angular velocities are measured separately because their influence on the planning result is different. In fig. 6, the lower and upper limits of the box represent the first quartile and the third quartile, respectively. The line in the box represents the median and the points marked with "+" outside the horizontal line represent outliers. As can be seen from fig. 6, the first and third, quartiles of the prediction error on the training data set and the test data set are very small, and the accuracy of training and testing of a trained planner is significantly higher than that of existing optimization-based planners. Here, the well-trained planner is obtained by using the local motion planning method for the mobile robot according to the present invention, and is also a planner based on learning.

Table 2 summarizes detailed indicators of the planning accuracy. It can be noted that the average error of linear and angular velocities on the test data set is as good as the training data set, indicating that the learned model has a strong generalization capability in the local planning task. We can also observe that the standard deviation of the errors on the test data set is higher than the training data set, indicating that some significant predictive outliers still exist.

	Training-v [ m/s]	Training-w [ rad/s]	Test-v [ m/s]	Test-w [ rad/s]
					Average	0.0031	0.0115	0.0037	0.0151
Standard of merit	0.0050	0.0132	0.0079	0.0308

Table 2: statistics of prediction errors

In addition to the average performance, we also investigated the case where large prediction errors occurred. Figure 7 shows the action instructions provided by the trained planner and the demonstrator in three cases within different error ranges of the test data set. In fig. 7, the left error is normal (0.0024), the middle error is large (0.1477), and the right error is large (0.5733). The robot is located at the origin where 2d occupies the north plane in the figure. Isolated gray cells are targets, while darkness of other cells represents the obstruction probability. The length of the line represents the linear velocity and the direction represents the angular velocity.

In the left case, a trained planner predicts an action that tends to stay some distance away from the obstacle clusters on both sides, which almost overlaps the decision provided by the demonstrator. In the middle, the trained planner is slightly different from the demonstrator, which may be due to ambiguities introduced by target points hidden in the obstacle clusters. When the local target lags behind the robot, as shown in the right case, it is difficult for a well-trained planner to produce exactly the same action as the demonstrator. However, the decision of a trained planner is still considered reasonable, as it turns to the correct direction with a smoother speed of change.

2) Comparison with optimization-based planner: one key motivation for the learning-based planner presented by the present invention is to reduce the computation time, which is a disadvantage of the traditional optimization-based approach. Thus, we compare the performance of a trained planner to an optimization-based planner in a complex planning environment. As described in D below, the number of candidate trajectories considered has a large impact on computation time and solution quality. In the following evaluations, we employed the demonstrator with 11 × 81 linear and angular velocity samples as the baseline planner, and evaluated the performance of the learning-based planner and the optimization-based planner with different numbers of samples (5 × 11,7 × 31,8 × 51,9 × 61,10 × 71). The best gap for each decision is defined as

Where v and w are the linear and angular velocities obtained from the evaluated planner,

and

from a baseline reference.

Fig. 8 shows a schematic comparison of the optimal gap between the learning-based planner and the optimization-based planner. As can be taken from fig. 8, the optimality error of the learning-based planner is approximately equal to the optimization-based planner, with 750 sample trajectories.

Fig. 9 shows a schematic diagram of a comparison of computation times between a learning-based planner and an optimization-based planner. As can be taken from fig. 9, the computation time of the learning-based planner is approximately equal to the optimization-based planner, with 160 sample trajectories.

As shown in fig. 8 and 9, for the optimization-based planner, although the average optimality gap decreases as the number of samples increases, the computation time increases almost linearly. In contrast, the proposed learning-based planner provides a highly competitive quality solution at around 22ms, significantly faster than the optimization-based approach. Note that better performance of the learning-based planner does not imply longer computation times. It is related to the quality of the demonstration behavior and training data set. If we solve equation (5a) sufficiently to be optimal, using it as a demonstrator, the quality of the trained strategy is expected to improve further.

C. Navigation in a simulated environment

Based on accurate frame-by-frame motion prediction, in this section, we evaluated the performance of a trained planner in navigation simulations. To check the ability to avoid obstacles, a global reference path for robot navigation is set to approach or cross the obstacle. At each sample time, a trained planner receives the local cost map and returns a velocity vector, which causes the robot to be driven to a new state according to a kinematic model. The resulting trace is shown in fig. 10. In fig. 10, the dotted line represents the global reference path, and the solid line and the dotted line correspond to the trajectories of the learning-based planner and the optimization-based demonstrator, respectively. As can be seen from fig. 10, the trajectory of the trained planner successfully avoids obstacles when the global reference values are close, while smoothly following the reference in open space. Furthermore, the trajectory of the trained planner is nearly identical to the behavior of the demonstrator, which illustrates the high quality of the trained planner in simulating the near-optimal demonstrator.

D. Navigation in the real world

Finally, we deploy a trained model into real-world navigation experiments. Given a global reference path, the task of the segway delivery robot is to follow the reference path, avoiding obstacles on the road. Two aspects are noted: response to unexpected obstacles on the reference path, and robustness to long-term operation.

1) Reaction to obstacles on the global path: figure 11 shows a schematic diagram of the response of a trained planner to an unexpected obstacle on a reference path in a real world experiment. In fig. 11, the upper part shows the local obstacle map fed to the planner. The lower half is the corresponding view captured from the robot front camera. The robot is located at a north-facing origin on the 2d local map. The length of the line represents the linear velocity and the direction represents the angular velocity.

As shown in fig. 11, the robot makes local planning decisions based on the local occupancy map constructed online. Although this map is relatively small in scale, this map provides rich ambient information. When an obstacle is encountered that obstructs the global trajectory, the trained planner successfully provides the action instructions that drive the robot to the open area.

2) And (3) long-term operation: long-term experiments were performed in narrow corridors filled with obstacles. The task of the robot is to recursively follow the global reference path without encountering any obstacles. FIG. 12 shows a schematic of navigation trajectories from two planners in a real-world environment, one being a hybrid training based on data of the exemplary trajectory and an artificially generated local cost map, and the other being a single training only with normal data collected from the exemplary experiment. In fig. 12, 2d occupies darkness of the graph in proportion to the obstacle probability; and the task of the robot is to navigate recursively in the office corridor with a trained planner without hitting an obstacle.

As can be seen from fig. 12, the robot operated with the latter planner touches several obstacles in a crowded area, and must be manually intervened for safety. In contrast, a planner trained with a mixed data set successfully drives the robot around obstacles and roams in corridors in a robust and smooth manner.

In conclusion, the invention provides a 2d local cost map based on preprocessing, and a local motion planning system with the capability of avoiding obstacles is established through deep simulation learning. A value iterative network is embedded, and a developed local planner has strong competitiveness in the aspects of decision quality, calculation time and robustness through mixed training of actual and artificial cost map images.

Future work may be extended in the following two ways. First, the distribution of the data set and the prediction error should be further studied in order to generate in a more efficient and easy-to-process manner an artificial cost map for feeding unowned context. Second, in addition to inferences about local planning strategies, we also consider extending the proposed model with additional outputs to predict uncertainty of action commands, which would be a valuable bonus for safe-to-up real-world operations.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for planning local motion of a mobile robot, the method comprising:

determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a 2d local cost map image according to the local target point and the local obstacle map;

determining a velocity of the mobile robot;

formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot;

wherein formulating action instructions for the mobile robot through a learning-based planner comprises:

u＝f_θ(m,u') (1)

where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, m is a 2d local cost map image, and f is a learning-based planner model.

2. The method of claim 1, wherein the determining a 2d local cost map image comprises:

acquiring data acquired by a preset sensor on the mobile robot;

3. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:

m＝λm_goal-m_obs (2)

4. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:

5. The method of claim 1 or 2, wherein determining the model weight parameter θ comprises:

given a set of exemplary action instructions

wherein the content of the first and second substances,

6. The method of claim 1 or 2, wherein said formulating action instructions for the mobile robot by the learning-based planner comprises:

7. The method of claim 1 or 2, wherein the method further comprises:

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

8. The method of claim 7, wherein the acquisition sources of training samples for the learning-based planner comprise:

9. A mobile robot local motion planning apparatus, the apparatus comprising:

the preprocessor is used for determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a plane 2d local cost map image according to the local target point and the local obstacle map; determining a velocity of the mobile robot;

a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot;

wherein, the controller is specifically configured to:

u＝f_θ(m,u') (1)

10. The apparatus of claim 9, wherein the preprocessor is specifically configured to:

acquiring data acquired by a preset sensor on the mobile robot;

11. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:

m＝λm_goal-m_obs (2)

12. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:

13. The apparatus of claim 9 or 10, wherein the controller determines the model weight parameter θ in a manner comprising:

given a set of exemplary action instructions

wherein the content of the first and second substances,

14. The apparatus of claim 9 or 10, wherein the controller is specifically configured to:

15. The apparatus of claim 9 or 10, wherein the controller is further configured to:

x_k+1＝h(x_k,u),k＝0,...,N-1 (5b)

16. The apparatus of claim 15, wherein the controller is further configured to:

17. A computer storage medium having computer-executable instructions stored thereon for performing the method of local motion planning for a mobile robot of any of claims 1 to 8.