CN107861508B - Local motion planning method and device for mobile robot - Google Patents

Local motion planning method and device for mobile robot Download PDF

Info

Publication number
CN107861508B
CN107861508B CN201710987041.8A CN201710987041A CN107861508B CN 107861508 B CN107861508 B CN 107861508B CN 201710987041 A CN201710987041 A CN 201710987041A CN 107861508 B CN107861508 B CN 107861508B
Authority
CN
China
Prior art keywords
local
mobile robot
map
cost map
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710987041.8A
Other languages
Chinese (zh)
Other versions
CN107861508A (en
Inventor
刘越江
陈子冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ninebot Beijing Technology Co Ltd
Original Assignee
Ninebot Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ninebot Beijing Technology Co Ltd filed Critical Ninebot Beijing Technology Co Ltd
Priority to CN201710987041.8A priority Critical patent/CN107861508B/en
Publication of CN107861508A publication Critical patent/CN107861508A/en
Priority to PCT/CN2018/087326 priority patent/WO2019076044A1/en
Application granted granted Critical
Publication of CN107861508B publication Critical patent/CN107861508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a local motion planning method for a mobile robot, which comprises the following steps: determining a planar (2d) local cost map image; determining a velocity of the mobile robot; formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot. The invention also discloses a local motion planning device of the mobile robot.

Description

Local motion planning method and device for mobile robot
Technical Field
The invention relates to the technical field of robots, in particular to a method and a device for planning local motion of a mobile robot.
Background
The obstacle avoidance motion planning is one of the basic skills that an intelligent mobile robot is expected to master. Various algorithms have been developed over the last decade to enable robots to plan to target points or follow reference paths without hitting the trajectory of an obstacle. Despite significant advances, mobile robots are far from human mobility planning. For example, humans make motion decisions quickly with negligible effort, adapt robustly to uncertainty and unforeseen obstacles, and behave very smoothly and naturally. Given sufficient localized and Global path information, such as Global Positioning System (GPS) and Google (Google) maps, people navigate under different conditions with powerful planning decision-making capabilities, whether during walking, running or driving. In contrast, mobile robots are also striving to achieve these.
One class of high performance planning algorithms addresses the local movement planning and obstacle avoidance issues from an optimization perspective. The main feature of these algorithms is to minimize the cost function (or maximize the utility function) of the underlying trajectory. The optimal trajectory is then repeatedly fed to the robot controller at each cycle together with the corresponding control strategy. This method is mathematically interpretable. However, solving the optimization problem is a challenge: since the objective function involves a robot dynamic model, and constraints may consist of components related to complex geometry, the optimization problem is typically non-linear and difficult to solve in real-time on consumer robots with limited computational resources. In order to reduce the computation time, researchers have proposed various methods, such as sampling and linearization methods, which however all lead to degradation into sub-optimal solutions.
In addition to optimization-based approaches, emulation Learning (IL) is becoming an emerging technology that addresses planning and control issues. The core idea of mock learning is to learn strategies from expert demonstrations by cloning expected behaviors or understanding the rewards pursued by the demonstrator. One fast growing branch in mock learning is the end-to-end approach, mapping raw sensory inputs directly to motor outputs without manual programming. End-to-end model learning systems have been developed for a wide range of applications and with exciting results through Convolutional Neural Networks (CNN). However, the performance of most existing CNN-based planners or controllers is significantly lower than that of policy demonstrations, in part because standard CNNs are not sufficient to express planning-based reasoning. Another challenge of mock learning is the mismatch in state distributions between the demonstrator and the learning strategy, which can result in trapping in suboptimal, even divergent trajectories when using the learning strategy. Some approaches attempt to solve this problem by iteratively collecting training samples that are closer to the learned strategy. However, scarcity of catastrophic events in the training dataset remains a real operational risk.
Therefore, how to make the mobile robot efficiently avoid obstacles remains a technical problem to be solved urgently.
Disclosure of Invention
In view of the above, the present invention is intended to provide a method and an apparatus for planning local motion of a mobile robot, and provide a new simulation learning method for planning local motion of a mobile robot and avoiding an obstacle, which can efficiently avoid an obstacle, accelerate a decision of planning local motion of a mobile robot, and simultaneously optimize, secure and generally make the decision as much as possible.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for planning local motion of a mobile robot, where the method includes:
determining a planar (2d) local cost map (Costmap) image;
determining a velocity of the mobile robot;
formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the foregoing solution, optionally, the determining a 2d local cost map image includes:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
In the foregoing solution, optionally, the learning-based planner learns the local motion planning strategy by:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
In the foregoing solution, optionally, the determining the 2d local cost map includes:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is set as 1, the values of other pixels are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective sub-target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
In the foregoing solution, optionally, the determining the 2d local cost map includes:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the foregoing solution, optionally, the determining the model weight parameter θ includes:
given a set of exemplary action instructions
Figure GDA0002067369680000031
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure GDA0002067369680000041
wherein the content of the first and second substances,
Figure GDA0002067369680000042
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure GDA0002067369680000043
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the foregoing solution, optionally, the formulating, by the learning-based planner, an action instruction for the mobile robot includes:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
Figure GDA0002067369680000044
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
In the foregoing scheme, optionally, the method further includes:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure GDA0002067369680000051
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure GDA0002067369680000052
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure GDA0002067369680000053
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
In the foregoing solution, optionally, the acquisition source of the training samples of the learning-based planner includes:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
In a second aspect, an embodiment of the present invention provides a local motion planning apparatus for a mobile robot, where the apparatus includes:
the preprocessor is used for determining a plane 2d local cost map image; determining a velocity of the mobile robot;
a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
In the foregoing scheme, optionally, the controller is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, using the projection point of the effective target point to the map borderReplacing the valid target point.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the foregoing solution, optionally, the manner for determining the model weight parameter θ by the controller includes:
given a set of exemplary action instructions
Figure GDA0002067369680000061
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure GDA0002067369680000062
wherein the content of the first and second substances,
Figure GDA0002067369680000071
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure GDA0002067369680000072
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the foregoing scheme, optionally, the controller is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
Figure GDA0002067369680000073
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
In the foregoing solution, optionally, the controller is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure GDA0002067369680000074
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure GDA0002067369680000075
where N is the length of the prediction horizon, xkIs the mobile at time step k2d pose of the robot, dgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure GDA0002067369680000081
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
In the foregoing solution, optionally, the controller is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
In a third aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored in the computer storage medium, where the computer program is used to execute the local motion planning method for a mobile robot described above.
The invention provides a local motion planning method and a local motion planning device for a mobile robot, which are used for determining a 2d local cost map image; determining a velocity of the mobile robot; based on the speed and the 2d local cost map image, an action instruction is made for the mobile robot through a learning-based planner, so that the mobile robot executes the action instruction, obstacles can be efficiently avoided, a local motion planning decision of the mobile robot is accelerated, and the made decision is optimized, safe and universal as much as possible.
Drawings
Fig. 1 is a schematic flow chart of an implementation of a local motion planning method for a mobile robot according to the present invention;
fig. 2 is a schematic structural diagram of a local motion planning apparatus for a mobile robot according to the present invention;
FIG. 3 is a block diagram of a local mobility planning system with a policy network according to the present invention;
FIG. 4 is a schematic diagram of a locally planned deep neural network provided by the present invention;
FIG. 5 is a schematic diagram of a Sagnerwi delivery robot provided in the present invention;
FIG. 6 is a block diagram of the absolute prediction error of a trained planner versus an exemplar provided by the present invention;
FIG. 7 is a diagram of the action instructions provided by the trained planner and demonstration machine of the present invention for three cases within different error ranges of a test data set;
FIG. 8 is a schematic diagram illustrating a comparison of the optimal gap between a learning-based planner and an optimization-based planner provided by the present invention;
FIG. 9 is a schematic diagram illustrating a comparison of computation time between a learning-based planner and an optimization-based planner provided by the present invention;
FIG. 10 is a navigation diagram in a simulation environment provided by the present invention;
FIG. 11 is a schematic diagram of the response of the trained planner provided by the present invention to an unexpected obstacle on a reference path in a real-world experiment;
FIG. 12 is a schematic diagram of a navigation trajectory from a real-world environment provided by the present invention.
Detailed Description
In order to better explain the present invention, some prior art studies on simulation learning are described below.
Currently, research on mimic learning is widely focused on end-to-end approaches. For example, the convolutional neural network model is trained to map visual input images to left/right steering commands for grounded mobile robots, quad-rotor aircraft, and autopilots. Another recent study has proposed a target driven movement planner that generates linear and angular velocities from the raw input of the laser rangefinder. Despite these advances, the effectiveness of network models in the planning context remains a challenge in existing policy emulation approaches.
To improve planning-based reasoning, new neural network architectures have recently been investigated in several works. An end-to-end architecture called predictor (predictron) is introduced, whose core is an abstract model represented by the markov reward process, which can deduce a value estimate. Another deep neural network architecture with similar potential motivations is a value iterative network that includes a special module for recursive value iteration. The learner developed a cyclic network as a representation of path integral optimal control to learn cost and dynamic models.
Another practical challenge of mock learning is the mismatch in data distribution between the exemplary and learned strategies. To address this challenge, researchers have proposed an iterative data aggression method and applied it to learning the reactive controllers of micro-aircraft. This approach is further extended to SafeDAgger, which trains a safe strategy to prevent the learning process from falling into a dangerous state, while reducing the frequency of human intervention. Another recent article addresses this problem by gradually filling the allocation gap between demonstration and learned strategies using an adaptive model predictive controller as a demonstrator for adjusting its strategy.
However, the scarcity of catastrophic events in the training data set of these approaches remains a drawback in the safety-to-top application.
Based on the above, the invention provides a simulation learning method for local motion planning and obstacle avoidance of a mobile robot. The main goal is to speed up the local motion planning decisions made by the mobile robot, while making the decisions as optimal, safe and versatile as possible.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
The embodiment of the invention provides a local motion planning method for a mobile robot, which mainly comprises the following steps of:
and step 101, determining a plane (2d) local cost map image.
Optionally, the determining a 2d local cost map image includes:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
Here, the predetermined sensor may be one or a plurality of sensors. For example, the predetermined sensor may be a camera or a video camera on the mobile robot.
And 102, determining the speed of the mobile robot.
As one embodiment, the determining the velocity of the mobile robot includes:
determining first position information of the mobile robot at a first moment;
determining second position information of the mobile robot at a second moment; wherein the first time is a time before the second time;
and determining the speed of the mobile robot according to the first position information, the second position information, the first time and the second time.
For example, the speed of the mobile robot is (second position information — first position information)/(second time-first time).
Of course, the manner of determining the speed of the mobile robot is not limited to the above-listed form, and may be determined by other means. For example, data is directly obtained from a speed sensor of the mobile robot, and the speed of the mobile robot is determined.
And 103, formulating action instructions for the mobile robot through a learning-based planner based on the speed and the 2d local cost map image so as to execute the action instructions by the mobile robot.
Here, the action instruction includes:
the linear velocity and the angular velocity that the mobile robot needs to execute.
In this way, the mobile robot can effectively avoid obstacles by executing the motion command determined in the above manner, and the time required for the motion command determined in the above manner is short.
The learning-based planner is a trained planner and can rapidly and accurately make action instructions for the mobile robot.
Illustratively, the learning-based planner learns a local motion planning strategy by:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.
In the above scheme, one of the determination manners of the 2d local cost map is as follows:
determining a cost map according to equation (2), wherein the cost map is constructed as a sum of local target rewards and obstacle penalties:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map in which the pixel value of a local target point is set to 1, the other pixel values are set to 0, and λ is a prize-related valueA hyperparameter of the excitation coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
In the above scheme, the second determination method of the 2d local cost map is as follows:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the above scheme, the determining the model weight parameter θ includes:
given a set of exemplary action instructions
Figure GDA0002067369680000121
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure GDA0002067369680000122
wherein the content of the first and second substances,
Figure GDA0002067369680000123
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure GDA0002067369680000124
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the above solution, the formulating an action instruction for the mobile robot by the learning-based planner includes:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
Figure GDA0002067369680000125
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot, so that the mobile robot can simulate expected behaviors and avoid obstacles in local motion planning.
Here, the attention area is a partial area on the cost map.
Here, the feature extracted from the region of interest, for example, only the feature that coincides with the direction in which the target point is oriented, contributes to quick acquisition of the motion instruction.
In the above scheme, the method further comprises:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator; here, the preset number may be understood as a large number;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure GDA0002067369680000131
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure GDA0002067369680000132
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure GDA0002067369680000133
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
Illustratively, the acquisition sources of training samples for the learning-based planner include:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
Here, the probability range of the predetermined ratio may be understood as: most of the time the learning based planner will encounter such a trajectory.
Optionally, the method for generating the second cost map includes:
firstly, randomly generating a preset number of binary barrier clusters;
and then, carrying out Gaussian fuzzy processing on the binary obstacle clusters, and carrying out probability conversion on the obstacle map.
And finally, randomly drawing the local target points on the obstacle map.
The method for planning the local motion of the mobile robot can efficiently avoid obstacles, quicken the speed of the mobile robot for making a local motion planning decision, and simultaneously make the decision as optimized, safe and universal as possible.
Example two
The embodiment provides a local motion planning device for a mobile robot, which comprises:
a preprocessor 10 for determining a planar 2d local cost map image; determining a velocity of the mobile robot;
a controller 20 for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the above solution, the mobile robot includes a learning-based planner.
As an embodiment, the preprocessor 10 is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
Specifically, how the learning-based planner learns the local motion planning strategy, how to select the demonstrator, how to obtain the training sample, and other processing manners can be realized by referring to corresponding contents in the mobile robot local motion planning method; and will not be described in detail herein.
As an embodiment, the controller 20 is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
As an embodiment, the preprocessor 10 is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
As an embodiment, the preprocessor 10 is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
As an embodiment, the manner in which the controller 20 determines the model weight parameter θ includes:
given a set of exemplary action instructions
Figure GDA0002067369680000161
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure GDA0002067369680000162
wherein the content of the first and second substances,
Figure GDA0002067369680000163
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure GDA0002067369680000164
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
As an embodiment, the controller 20 is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
Figure GDA0002067369680000165
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the local cost of the mobile robot at 2d after executing the motion command u at the position sThe location of a pixel in the map;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
As an embodiment, the controller 20 is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure GDA0002067369680000171
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure GDA0002067369680000172
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure GDA0002067369680000173
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
As an embodiment, the controller 20 is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
Those skilled in the art should understand that the functions implemented by each processing module in the mobile robot local motion planning apparatus shown in fig. 2 can be understood by referring to the related description of the mobile robot local motion planning method. Those skilled in the art will appreciate that the functions of each processing unit in the local motion planning apparatus for a mobile robot shown in fig. 2 can be implemented by a program running on a processor, and can also be implemented by a specific logic circuit.
The local motion planning device for the mobile robot can enable the mobile robot to efficiently avoid obstacles, quickly make local motion planning decisions, and simultaneously enable the made decisions to be optimized, safe and universal as much as possible.
EXAMPLE III
Based on the method and the device for planning the local motion of the mobile robot in the first embodiment and the second embodiment, a method for planning the local motion of the mobile robot and avoiding obstacles, which is proposed by deep simulation learning, is given below. The main goal is to speed up the mobile robot making local motion planning decisions, while making the decisions as optimal, safe and versatile as possible.
A. System architecture
Fig. 3 is a block diagram of a local movement planning system with a policy network, and as can be seen from fig. 3, the system mainly includes two large planning blocks, the first planning block is used for preprocessing raw sensing data and generating a local occupancy map describing surrounding obstacles and local target points extracted from a global path according to the posture of the robot. These intermediate results are then fed to a second planning block where we use a deep neural network to simulate a local planning strategy. In addition, we also provide the speed of the robot as a network input to improve the smoothness of the sequential decision. During deployment, the proposed neural network strategy generates action commands by performing feed-forward calculations at each sample time, and is therefore computationally efficient and easy to process for real-time decision-making.
B. Problem formulation and how to learn local motion planning strategies
Given the 2d local cost map image and the mobile robot speed, the local motion planning strategy provides the following action commands:
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is the latest velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.
Specifically, the cost map is constructed as the sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is equal to 1, the others are 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
It should be noted that the cost map can also be determined by other ways:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
Note that the cost map can also be learned within the neural network by adding an additional convolutional layer in front of the network. However, in our empirical experiments we observed that the inference precision differed very little. Two possible reasons may be: the cost map calculated from equation (2) does represent the essence of the cost function of the demonstrator, and the cost function is finally learned and adapted to the cost map. In this work, we removed the convolutional layer of cost learning, aiming to reduce model redundancy.
Given a set of exemplary action instructions
Figure GDA0002067369680000191
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure GDA0002067369680000192
wherein the content of the first and second substances,
Figure GDA0002067369680000193
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure GDA0002067369680000194
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
C. Neural network model, representing f by using the neural network modelθ(mi,u'i)
Inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
Figure GDA0002067369680000195
wherein s and r correspond to the position and cost of the pixel in the cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor, and the value of gamma is 0-1; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s') represents the velocity of the mobile robot at the kth cycle; representing the value of the mobile robot at the position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest, such as a Q value in fig. 4, from the high-level planning features extracted by formula (4);
advanced planning features from the area of interest are fused with the speed of the mobile robot and the fused results are provided to a fully connected layer that can generate smooth continuous motion commands to mimic expected behavior and avoid obstacles in local motion planning.
Here, the region of interest is a partial region on the 2d local cost map.
Here, the feature extracted from the region of interest, for example, only the feature that coincides with the direction in which the target point is oriented, contributes to quick acquisition of the motion instruction.
Fig. 4 shows a schematic diagram of a locally planned deep neural network, in fig. 4, the parameters in the convolutional layer represent filter size, depth and stride. The parameters in the fully connected layers indicate the size of the output unit. In the experiment, the loop number K can be selected as 36, and how to select the loop number K is related to the size of the local cost map image.
D. Demonstrator based on optimization
Since the main goal of the proposed method is to learn the optimal strategy, the natural choice of the demonstrator is an optimization-based planner provided with a large amount of computational resources. In the present invention, we use a set of developed local planners as demonstrator, which tries to minimize the cost at each sampling time, specifically, the demonstrator determines the motion command for the mobile robot according to the formulas (5a), (5b), (5c), wherein the formulas (5b) and (5c) are constraint conditions;
Figure GDA0002067369680000201
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure GDA0002067369680000202
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure GDA0002067369680000203
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter. To simplify the calculation, the control action is assumed to be unchanged within the prediction range.
During the robot navigation, the solution according to equations (5a), (5b), (5c) needs to be repeated at each sampling time. However, the term mobs(xk) And g (x)kU) may involve non-linearities that make the problem difficult to solve effectively. Conventional approaches use a sampling-based solver, which often compromises an approximate sub-optimal solution. At each planning cycle, some potential trajectories within the prediction horizon are first generated and then evaluated according to the associated cost values. The quality of the resulting solution is therefore of great rangeDepending on the number and differences of candidate trajectories considered. For behavioral demonstration purposes, we employ a large sample planner to provide a near-optimal reference strategy.
E. Data acquisition
As previously mentioned, an open challenge of mock learning is a state assignment mismatch between the training data set and the learned strategy. To effectively solve this problem and to cover as completely as possible the observations that may occur, we collected training samples from two sources. The first is a normal experimental trajectory demonstrated by an expert, which is expected to be encountered most of the time by a trained planner. The second is a random cost map generated artificially to feed dangerous situations that the demonstrator rarely encounters. The random cost map generation process is as follows: some binary obstacle clusters are first randomly generated and then gaussian blurred for probability transformation of the obstacle map. Subsequently, the local target points are randomly drawn on the map. Finally, we convert the cost map into a robot coordination system so that the poses of the robots in all data sets are the same, which is considered to be beneficial to sample efficiency.
The invention develops a simulated learning algorithm, realizes the real-time approximate optimal local motion planning, and simultaneously keeps better safety and universality for the application of the mobile robot. Unlike end-to-end modeling, we developed a local planning strategy based on a preprocessed 2d local cost map as input. The local cost map may be constructed from local target points and obstacle maps containing multiple frames of information received from sensor devices such as lidar, sonar and depth cameras. At each sample time, our local motion planning model embedded in the value iterative network produces an action instruction through feed forward reasoning, which is computationally efficient and enables planning-based reasoning. To train a robust model, we use a combination of real-world local obstacle maps and randomly generated artificial maps collected from demonstration experiments, which not only speeds up the data collection process, but also complements dangerous observation samples that are rarely encountered in demonstration. A brief comparison of the methods presented herein and existing learning methods is summarized in table 1.
Figure GDA0002067369680000221
Table 1: comparison of local movement planning methods between local 2d mimic learning, existing end-to-end mimic learning, and traditional optimization-based methods.
The main contributions of the present application include at least:
1. the local motion planning is expressed as a simulated learning problem based on the pre-processed 2d local cost map image. This form of learning may incorporate Value Iteration Networks (VINs) and is common to a wide variety of sensor devices.
2. By randomly generating an artificial local cost map, the problem of a shortage of training data, particularly a scarce but dangerous sample of events, is overcome.
In order to better illustrate the effectiveness and feasibility of the local motion planning method for the mobile robot, the local motion planning method for the mobile robot can be applied to a robot platform for experiments.
In the following, the experiments and evaluation of the local planner are described in detail.
A. Equipment
First, the robot platform used in the experiment was a delivery robot (Loomo Go), a Segway delivery robot (Segway delivery robot) equipped with an Intel real sensor (Intel real sensor), an ultrasonic sensor, and a wheel encoder. The depth camera in RealSense is used to maintain a fixed size 2.8m of the partial occupancy map with a resolution of 0.1 m.
Second, for model training, we have collected over 600k tuples in total, half from the exemplary trajectory and half from random generation. The data set was divided into a training set (80%) and a test set (20%). Here, the ratio of the training set to the test set may be adjusted according to actual requirements. The exemplary uses an optimization-based planner that samples 11 grid points for linear velocities in the range of 0.0m/s, 0.5m/s and 81 grid points for angular velocities in the range of-0.8 rad/s, 0.8 rad/s. The neural network model was implemented using a deep learning system (Tensorflow) framework and trained ab initio on an Ingland-Tatan (Nvidia Titan X) for about 8 hours with an Adam optimizer; wherein Adam is an algorithm for optimizing a random objective function based on a first order gradient; evaluation and deployment were performed on a laptop (laptop) using an Intel i7-6700HQ CPU, Ubuntu 14.04 operating system. FIG. 5 shows a schematic diagram of a SageWis delivery robot in FIG. 5 equipped with Intel Realsense ZR300(30Hz RGB-depth, FishEye and IMU), Intel Atom Z8750(4 cores, 2.4GHz) and 4GB memory; wherein, IMU's English is called completely Inertial measurement unit, and Chinese meaning is Inertial measurement unit.
B. Model index
We first evaluate the trained model frame by comparing the prediction accuracy in the training dataset and the test dataset and the performance of the proposed learning-based planner against the optimization-based planner.
1) Training and testing accuracy: FIG. 6 shows a block diagram of the absolute prediction error of a trained planner versus an exemplar. The errors of the linear and angular velocities are measured separately because their influence on the planning result is different. In fig. 6, the lower and upper limits of the box represent the first quartile and the third quartile, respectively. The line in the box represents the median and the points marked with "+" outside the horizontal line represent outliers. As can be seen from fig. 6, the first and third, quartiles of the prediction error on the training data set and the test data set are very small, and the accuracy of training and testing of a trained planner is significantly higher than that of existing optimization-based planners. Here, the well-trained planner is obtained by using the local motion planning method for the mobile robot according to the present invention, and is also a planner based on learning.
Table 2 summarizes detailed indicators of the planning accuracy. It can be noted that the average error of linear and angular velocities on the test data set is as good as the training data set, indicating that the learned model has a strong generalization capability in the local planning task. We can also observe that the standard deviation of the errors on the test data set is higher than the training data set, indicating that some significant predictive outliers still exist.
Training-v [ m/s] Training-w [ rad/s] Test-v [ m/s] Test-w [ rad/s]
Average 0.0031 0.0115 0.0037 0.0151
Standard of merit 0.0050 0.0132 0.0079 0.0308
Table 2: statistics of prediction errors
In addition to the average performance, we also investigated the case where large prediction errors occurred. Figure 7 shows the action instructions provided by the trained planner and the demonstrator in three cases within different error ranges of the test data set. In fig. 7, the left error is normal (0.0024), the middle error is large (0.1477), and the right error is large (0.5733). The robot is located at the origin where 2d occupies the north plane in the figure. Isolated gray cells are targets, while darkness of other cells represents the obstruction probability. The length of the line represents the linear velocity and the direction represents the angular velocity.
In the left case, a trained planner predicts an action that tends to stay some distance away from the obstacle clusters on both sides, which almost overlaps the decision provided by the demonstrator. In the middle, the trained planner is slightly different from the demonstrator, which may be due to ambiguities introduced by target points hidden in the obstacle clusters. When the local target lags behind the robot, as shown in the right case, it is difficult for a well-trained planner to produce exactly the same action as the demonstrator. However, the decision of a trained planner is still considered reasonable, as it turns to the correct direction with a smoother speed of change.
2) Comparison with optimization-based planner: one key motivation for the learning-based planner presented by the present invention is to reduce the computation time, which is a disadvantage of the traditional optimization-based approach. Thus, we compare the performance of a trained planner to an optimization-based planner in a complex planning environment. As described in D below, the number of candidate trajectories considered has a large impact on computation time and solution quality. In the following evaluations, we employed the demonstrator with 11 × 81 linear and angular velocity samples as the baseline planner, and evaluated the performance of the learning-based planner and the optimization-based planner with different numbers of samples (5 × 11,7 × 31,8 × 51,9 × 61,10 × 71). The best gap for each decision is defined as
Figure GDA0002067369680000241
Where v and w are the linear and angular velocities obtained from the evaluated planner,
Figure GDA0002067369680000242
and
Figure GDA0002067369680000243
from a baseline reference.
Fig. 8 shows a schematic comparison of the optimal gap between the learning-based planner and the optimization-based planner. As can be taken from fig. 8, the optimality error of the learning-based planner is approximately equal to the optimization-based planner, with 750 sample trajectories.
Fig. 9 shows a schematic diagram of a comparison of computation times between a learning-based planner and an optimization-based planner. As can be taken from fig. 9, the computation time of the learning-based planner is approximately equal to the optimization-based planner, with 160 sample trajectories.
As shown in fig. 8 and 9, for the optimization-based planner, although the average optimality gap decreases as the number of samples increases, the computation time increases almost linearly. In contrast, the proposed learning-based planner provides a highly competitive quality solution at around 22ms, significantly faster than the optimization-based approach. Note that better performance of the learning-based planner does not imply longer computation times. It is related to the quality of the demonstration behavior and training data set. If we solve equation (5a) sufficiently to be optimal, using it as a demonstrator, the quality of the trained strategy is expected to improve further.
C. Navigation in a simulated environment
Based on accurate frame-by-frame motion prediction, in this section, we evaluated the performance of a trained planner in navigation simulations. To check the ability to avoid obstacles, a global reference path for robot navigation is set to approach or cross the obstacle. At each sample time, a trained planner receives the local cost map and returns a velocity vector, which causes the robot to be driven to a new state according to a kinematic model. The resulting trace is shown in fig. 10. In fig. 10, the dotted line represents the global reference path, and the solid line and the dotted line correspond to the trajectories of the learning-based planner and the optimization-based demonstrator, respectively. As can be seen from fig. 10, the trajectory of the trained planner successfully avoids obstacles when the global reference values are close, while smoothly following the reference in open space. Furthermore, the trajectory of the trained planner is nearly identical to the behavior of the demonstrator, which illustrates the high quality of the trained planner in simulating the near-optimal demonstrator.
D. Navigation in the real world
Finally, we deploy a trained model into real-world navigation experiments. Given a global reference path, the task of the segway delivery robot is to follow the reference path, avoiding obstacles on the road. Two aspects are noted: response to unexpected obstacles on the reference path, and robustness to long-term operation.
1) Reaction to obstacles on the global path: figure 11 shows a schematic diagram of the response of a trained planner to an unexpected obstacle on a reference path in a real world experiment. In fig. 11, the upper part shows the local obstacle map fed to the planner. The lower half is the corresponding view captured from the robot front camera. The robot is located at a north-facing origin on the 2d local map. The length of the line represents the linear velocity and the direction represents the angular velocity.
As shown in fig. 11, the robot makes local planning decisions based on the local occupancy map constructed online. Although this map is relatively small in scale, this map provides rich ambient information. When an obstacle is encountered that obstructs the global trajectory, the trained planner successfully provides the action instructions that drive the robot to the open area.
2) And (3) long-term operation: long-term experiments were performed in narrow corridors filled with obstacles. The task of the robot is to recursively follow the global reference path without encountering any obstacles. FIG. 12 shows a schematic of navigation trajectories from two planners in a real-world environment, one being a hybrid training based on data of the exemplary trajectory and an artificially generated local cost map, and the other being a single training only with normal data collected from the exemplary experiment. In fig. 12, 2d occupies darkness of the graph in proportion to the obstacle probability; and the task of the robot is to navigate recursively in the office corridor with a trained planner without hitting an obstacle.
As can be seen from fig. 12, the robot operated with the latter planner touches several obstacles in a crowded area, and must be manually intervened for safety. In contrast, a planner trained with a mixed data set successfully drives the robot around obstacles and roams in corridors in a robust and smooth manner.
In conclusion, the invention provides a 2d local cost map based on preprocessing, and a local motion planning system with the capability of avoiding obstacles is established through deep simulation learning. A value iterative network is embedded, and a developed local planner has strong competitiveness in the aspects of decision quality, calculation time and robustness through mixed training of actual and artificial cost map images.
Future work may be extended in the following two ways. First, the distribution of the data set and the prediction error should be further studied in order to generate in a more efficient and easy-to-process manner an artificial cost map for feeding unowned context. Second, in addition to inferences about local planning strategies, we also consider extending the proposed model with additional outputs to predict uncertainty of action commands, which would be a valuable bonus for safe-to-up real-world operations.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (17)

1. A method for planning local motion of a mobile robot, the method comprising:
determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a 2d local cost map image according to the local target point and the local obstacle map;
determining a velocity of the mobile robot;
formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot;
wherein formulating action instructions for the mobile robot through a learning-based planner comprises:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, m is a 2d local cost map image, and f is a learning-based planner model.
2. The method of claim 1, wherein the determining a 2d local cost map image comprises:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
3. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
4. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
5. The method of claim 1 or 2, wherein determining the model weight parameter θ comprises:
given a set of exemplary action instructions
Figure FDF0000010840270000021
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure FDF0000010840270000022
wherein the content of the first and second substances,
Figure FDF0000010840270000023
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure FDF0000010840270000024
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
6. The method of claim 1 or 2, wherein said formulating action instructions for the mobile robot by the learning-based planner comprises:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
Figure FDF0000010840270000031
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
7. The method of claim 1 or 2, wherein the method further comprises:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure FDF0000010840270000032
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure FDF0000010840270000033
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure FDF0000010840270000034
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
8. The method of claim 7, wherein the acquisition sources of training samples for the learning-based planner comprise:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
9. A mobile robot local motion planning apparatus, the apparatus comprising:
the preprocessor is used for determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a plane 2d local cost map image according to the local target point and the local obstacle map; determining a velocity of the mobile robot;
a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot;
wherein, the controller is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, m is a 2d local cost map image, and f is a learning-based planner model.
10. The apparatus of claim 9, wherein the preprocessor is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
11. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
12. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
13. The apparatus of claim 9 or 10, wherein the controller determines the model weight parameter θ in a manner comprising:
given a set of exemplary action instructions
Figure FDF0000010840270000051
The planner trains the required local motion planning strategy based on the error minimization criterion:
Figure FDF0000010840270000052
wherein the content of the first and second substances,
Figure FDF0000010840270000053
is an exemplary tuple, J (θ) represents an error function, is an exemplary action instruction
Figure FDF0000010840270000054
And the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
14. The apparatus of claim 9 or 10, wherein the controller is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
Figure FDF0000010840270000061
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
15. The apparatus of claim 9 or 10, wherein the controller is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
Figure FDF0000010840270000062
xk+1=h(xk,u),k=0,...,N-1 (5b)
Figure FDF0000010840270000063
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,
Figure FDF0000010840270000064
is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
16. The apparatus of claim 15, wherein the controller is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
17. A computer storage medium having computer-executable instructions stored thereon for performing the method of local motion planning for a mobile robot of any of claims 1 to 8.
CN201710987041.8A 2017-10-20 2017-10-20 Local motion planning method and device for mobile robot Active CN107861508B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710987041.8A CN107861508B (en) 2017-10-20 2017-10-20 Local motion planning method and device for mobile robot
PCT/CN2018/087326 WO2019076044A1 (en) 2017-10-20 2018-05-17 Mobile robot local motion planning method and apparatus and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710987041.8A CN107861508B (en) 2017-10-20 2017-10-20 Local motion planning method and device for mobile robot

Publications (2)

Publication Number Publication Date
CN107861508A CN107861508A (en) 2018-03-30
CN107861508B true CN107861508B (en) 2021-04-20

Family

ID=61697686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710987041.8A Active CN107861508B (en) 2017-10-20 2017-10-20 Local motion planning method and device for mobile robot

Country Status (2)

Country Link
CN (1) CN107861508B (en)
WO (1) WO2019076044A1 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861508B (en) * 2017-10-20 2021-04-20 纳恩博(北京)科技有限公司 Local motion planning method and device for mobile robot
CN110858328B (en) * 2018-08-06 2022-06-14 纳恩博(北京)科技有限公司 Data acquisition method and device for simulating learning and storage medium
CN109358618B (en) * 2018-09-28 2021-10-26 安徽工程大学 Path planning method of mobile robot
CN110046457B (en) * 2019-04-26 2021-02-05 百度在线网络技术(北京)有限公司 Human body model control method and device, electronic equipment and storage medium
CN111912407B (en) * 2019-05-08 2022-05-17 胡贤良 Path planning method of multi-robot system
CN110285813B (en) * 2019-07-01 2022-11-25 东南大学 Man-machine co-fusion navigation device and method for indoor mobile robot
CN110503065B (en) * 2019-08-28 2023-04-07 南京大学 Mobile equipment user action gesture recognition method based on distance measurement
JP7221839B2 (en) * 2019-10-08 2023-02-14 国立大学法人静岡大学 Autonomous Mobile Robot and Control Program for Autonomous Mobile Robot
CN112783147A (en) * 2019-11-11 2021-05-11 科沃斯机器人股份有限公司 Trajectory planning method and device, robot and storage medium
CN111079603A (en) * 2019-12-06 2020-04-28 青岛歌尔智能传感器有限公司 Step prediction method, controller, positioning device and readable storage medium
CN111288995B (en) * 2020-03-12 2022-05-13 深圳市人工智能与机器人研究院 Route planning method and route planning device of mobile robot and terminal equipment
CN111739099B (en) * 2020-07-20 2020-12-11 北京云迹科技有限公司 Falling prevention method and device and electronic equipment
WO2022027199A1 (en) * 2020-08-03 2022-02-10 深圳市大疆创新科技有限公司 Control method for movable platform, movable platform and storage medium
CN113741480B (en) * 2021-09-16 2024-06-28 中科南京软件技术研究院 Obstacle avoidance method based on combination of dynamic obstacle extraction and cost map
CN113805483B (en) * 2021-09-17 2022-07-12 中国人民解放军国防科技大学 Robot control method and device based on model prediction and computer equipment
CN114237242B (en) * 2021-12-14 2024-02-23 北京云迹科技股份有限公司 Method and device for controlling robot based on optical encoder
CN114355923B (en) * 2021-12-28 2024-04-02 杭州电子科技大学 MPC-based track planning and tracking method under A-guidance
CN115167434B (en) * 2022-07-21 2024-06-28 清华大学深圳国际研究生院 Local navigation obstacle avoidance method and robot
CN115421494A (en) * 2022-09-19 2022-12-02 西安交通大学 Cleaning robot path planning method, system, computer device and storage medium
CN115542901B (en) * 2022-09-21 2024-06-07 北京航空航天大学 Deformable robot obstacle avoidance method based on near-end strategy training
CN116911176B (en) * 2023-07-08 2024-04-30 哈尔滨理工大学 Terrain trafficability prediction method based on speed and vibration state of wheeled mobile robot
CN117232531B (en) * 2023-11-14 2024-01-30 长沙小钴科技有限公司 Robot navigation planning method, storage medium and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130112507A (en) * 2012-04-04 2013-10-14 인하대학교 산학협력단 Safe path planning method of a mobile robot using s× algorithm
CN106325275A (en) * 2016-09-14 2017-01-11 广州今甲智能科技有限公司 Robot navigation system, robot navigation method and robot navigation device
CN106774347A (en) * 2017-02-24 2017-05-31 安科智慧城市技术(中国)有限公司 Robot path planning method, device and robot under indoor dynamic environment
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device
WO2017095591A1 (en) * 2015-12-02 2017-06-08 Qualcomm Incorporated Simultaneous mapping and planning by a robot

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573140A (en) * 2013-10-09 2015-04-29 北京军区军事训练模拟仿真研发服务中心 Layered dynamic path planning method applied to virtual simulation
CN105629974B (en) * 2016-02-04 2018-12-04 重庆大学 A kind of robot path planning method and system based on modified Artificial Potential Field Method
CN105955280A (en) * 2016-07-19 2016-09-21 Tcl集团股份有限公司 Mobile robot path planning and obstacle avoidance method and system
CN107861508B (en) * 2017-10-20 2021-04-20 纳恩博(北京)科技有限公司 Local motion planning method and device for mobile robot

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130112507A (en) * 2012-04-04 2013-10-14 인하대학교 산학협력단 Safe path planning method of a mobile robot using s× algorithm
WO2017095591A1 (en) * 2015-12-02 2017-06-08 Qualcomm Incorporated Simultaneous mapping and planning by a robot
CN106325275A (en) * 2016-09-14 2017-01-11 广州今甲智能科技有限公司 Robot navigation system, robot navigation method and robot navigation device
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device
CN106774347A (en) * 2017-02-24 2017-05-31 安科智慧城市技术(中国)有限公司 Robot path planning method, device and robot under indoor dynamic environment

Also Published As

Publication number Publication date
CN107861508A (en) 2018-03-30
WO2019076044A1 (en) 2019-04-25

Similar Documents

Publication Publication Date Title
CN107861508B (en) Local motion planning method and device for mobile robot
Chen et al. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning
Chen et al. Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety
Dai et al. Fast frontier-based information-driven autonomous exploration with an mav
EP3788549B1 (en) Stacked convolutional long short-term memory for model-free reinforcement learning
Min et al. RNN-based path prediction of obstacle vehicles with deep ensemble
Grigorescu et al. Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles
CN114846425A (en) Prediction and planning of mobile robots
Zhu et al. Off-road autonomous vehicles traversability analysis and trajectory planning based on deep inverse reinforcement learning
Liu et al. Map-based deep imitation learning for obstacle avoidance
CN103901891A (en) Dynamic particle tree SLAM algorithm based on hierarchical structure
Paz et al. Tridentnet: A conditional generative model for dynamic trajectory generation
Wiedemann et al. Robotic information gathering with reinforcement learning assisted by domain knowledge: An application to gas source localization
Agarwal et al. Imitative planning using conditional normalizing flow
Ramezani et al. UAV path planning employing MPC-reinforcement learning method considering collision avoidance
Kim et al. Learning forward dynamics model and informed trajectory sampler for safe quadruped navigation
Jacinto et al. Navigation of autonomous vehicles using reinforcement learning with generalized advantage estimation
WO2023242223A1 (en) Motion prediction for mobile agents
Rezaei et al. A deep learning-based approach for vehicle motion prediction in autonomous driving
EP3839830A1 (en) Trajectory estimation for vehicles
Artuñedo et al. Machine learning based motion planning approach for intelligent vehicles
Kvasnikov et al. Designing a computerized information processing system to build a movement trajectory of an unmanned aircraft vehicle
Xue et al. Combining Motion Planner and Deep Reinforcement Learning for UAV Navigation in Unknown Environment
Daryina et al. Unmanned vehicle’s control real-time method based on neural network and selection function
Natan et al. DeepIPC: Deeply integrated perception and control for an autonomous vehicle in real environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant