CN107861508B - Local motion planning method and device for mobile robot - Google Patents
Local motion planning method and device for mobile robot Download PDFInfo
- Publication number
- CN107861508B CN107861508B CN201710987041.8A CN201710987041A CN107861508B CN 107861508 B CN107861508 B CN 107861508B CN 201710987041 A CN201710987041 A CN 201710987041A CN 107861508 B CN107861508 B CN 107861508B
- Authority
- CN
- China
- Prior art keywords
- local
- mobile robot
- map
- cost map
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000009471 action Effects 0.000 claims abstract description 85
- 238000012549 training Methods 0.000 claims description 33
- 238000003062 neural network model Methods 0.000 claims description 31
- 238000005457 optimization Methods 0.000 claims description 29
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 238000002474 experimental method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 15
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 7
- 239000000126 substance Substances 0.000 claims description 7
- 230000007704 transition Effects 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 17
- 238000013459 approach Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 10
- 238000004088 simulation Methods 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000003278 mimic effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000008450 motivation Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000016571 aggressive behavior Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a local motion planning method for a mobile robot, which comprises the following steps: determining a planar (2d) local cost map image; determining a velocity of the mobile robot; formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot. The invention also discloses a local motion planning device of the mobile robot.
Description
Technical Field
The invention relates to the technical field of robots, in particular to a method and a device for planning local motion of a mobile robot.
Background
The obstacle avoidance motion planning is one of the basic skills that an intelligent mobile robot is expected to master. Various algorithms have been developed over the last decade to enable robots to plan to target points or follow reference paths without hitting the trajectory of an obstacle. Despite significant advances, mobile robots are far from human mobility planning. For example, humans make motion decisions quickly with negligible effort, adapt robustly to uncertainty and unforeseen obstacles, and behave very smoothly and naturally. Given sufficient localized and Global path information, such as Global Positioning System (GPS) and Google (Google) maps, people navigate under different conditions with powerful planning decision-making capabilities, whether during walking, running or driving. In contrast, mobile robots are also striving to achieve these.
One class of high performance planning algorithms addresses the local movement planning and obstacle avoidance issues from an optimization perspective. The main feature of these algorithms is to minimize the cost function (or maximize the utility function) of the underlying trajectory. The optimal trajectory is then repeatedly fed to the robot controller at each cycle together with the corresponding control strategy. This method is mathematically interpretable. However, solving the optimization problem is a challenge: since the objective function involves a robot dynamic model, and constraints may consist of components related to complex geometry, the optimization problem is typically non-linear and difficult to solve in real-time on consumer robots with limited computational resources. In order to reduce the computation time, researchers have proposed various methods, such as sampling and linearization methods, which however all lead to degradation into sub-optimal solutions.
In addition to optimization-based approaches, emulation Learning (IL) is becoming an emerging technology that addresses planning and control issues. The core idea of mock learning is to learn strategies from expert demonstrations by cloning expected behaviors or understanding the rewards pursued by the demonstrator. One fast growing branch in mock learning is the end-to-end approach, mapping raw sensory inputs directly to motor outputs without manual programming. End-to-end model learning systems have been developed for a wide range of applications and with exciting results through Convolutional Neural Networks (CNN). However, the performance of most existing CNN-based planners or controllers is significantly lower than that of policy demonstrations, in part because standard CNNs are not sufficient to express planning-based reasoning. Another challenge of mock learning is the mismatch in state distributions between the demonstrator and the learning strategy, which can result in trapping in suboptimal, even divergent trajectories when using the learning strategy. Some approaches attempt to solve this problem by iteratively collecting training samples that are closer to the learned strategy. However, scarcity of catastrophic events in the training dataset remains a real operational risk.
Therefore, how to make the mobile robot efficiently avoid obstacles remains a technical problem to be solved urgently.
Disclosure of Invention
In view of the above, the present invention is intended to provide a method and an apparatus for planning local motion of a mobile robot, and provide a new simulation learning method for planning local motion of a mobile robot and avoiding an obstacle, which can efficiently avoid an obstacle, accelerate a decision of planning local motion of a mobile robot, and simultaneously optimize, secure and generally make the decision as much as possible.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for planning local motion of a mobile robot, where the method includes:
determining a planar (2d) local cost map (Costmap) image;
determining a velocity of the mobile robot;
formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the foregoing solution, optionally, the determining a 2d local cost map image includes:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
In the foregoing solution, optionally, the learning-based planner learns the local motion planning strategy by:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
In the foregoing solution, optionally, the determining the 2d local cost map includes:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is set as 1, the values of other pixels are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective sub-target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
In the foregoing solution, optionally, the determining the 2d local cost map includes:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the foregoing solution, optionally, the determining the model weight parameter θ includes:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the foregoing solution, optionally, the formulating, by the learning-based planner, an action instruction for the mobile robot includes:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
In the foregoing scheme, optionally, the method further includes:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
In the foregoing solution, optionally, the acquisition source of the training samples of the learning-based planner includes:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
In a second aspect, an embodiment of the present invention provides a local motion planning apparatus for a mobile robot, where the apparatus includes:
the preprocessor is used for determining a plane 2d local cost map image; determining a velocity of the mobile robot;
a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
In the foregoing scheme, optionally, the controller is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, using the projection point of the effective target point to the map borderReplacing the valid target point.
In the foregoing scheme, optionally, the preprocessor is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the foregoing solution, optionally, the manner for determining the model weight parameter θ by the controller includes:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the foregoing scheme, optionally, the controller is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
In the foregoing solution, optionally, the controller is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the mobile at time step k2d pose of the robot, dgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
In the foregoing solution, optionally, the controller is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
In a third aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored in the computer storage medium, where the computer program is used to execute the local motion planning method for a mobile robot described above.
The invention provides a local motion planning method and a local motion planning device for a mobile robot, which are used for determining a 2d local cost map image; determining a velocity of the mobile robot; based on the speed and the 2d local cost map image, an action instruction is made for the mobile robot through a learning-based planner, so that the mobile robot executes the action instruction, obstacles can be efficiently avoided, a local motion planning decision of the mobile robot is accelerated, and the made decision is optimized, safe and universal as much as possible.
Drawings
Fig. 1 is a schematic flow chart of an implementation of a local motion planning method for a mobile robot according to the present invention;
fig. 2 is a schematic structural diagram of a local motion planning apparatus for a mobile robot according to the present invention;
FIG. 3 is a block diagram of a local mobility planning system with a policy network according to the present invention;
FIG. 4 is a schematic diagram of a locally planned deep neural network provided by the present invention;
FIG. 5 is a schematic diagram of a Sagnerwi delivery robot provided in the present invention;
FIG. 6 is a block diagram of the absolute prediction error of a trained planner versus an exemplar provided by the present invention;
FIG. 7 is a diagram of the action instructions provided by the trained planner and demonstration machine of the present invention for three cases within different error ranges of a test data set;
FIG. 8 is a schematic diagram illustrating a comparison of the optimal gap between a learning-based planner and an optimization-based planner provided by the present invention;
FIG. 9 is a schematic diagram illustrating a comparison of computation time between a learning-based planner and an optimization-based planner provided by the present invention;
FIG. 10 is a navigation diagram in a simulation environment provided by the present invention;
FIG. 11 is a schematic diagram of the response of the trained planner provided by the present invention to an unexpected obstacle on a reference path in a real-world experiment;
FIG. 12 is a schematic diagram of a navigation trajectory from a real-world environment provided by the present invention.
Detailed Description
In order to better explain the present invention, some prior art studies on simulation learning are described below.
Currently, research on mimic learning is widely focused on end-to-end approaches. For example, the convolutional neural network model is trained to map visual input images to left/right steering commands for grounded mobile robots, quad-rotor aircraft, and autopilots. Another recent study has proposed a target driven movement planner that generates linear and angular velocities from the raw input of the laser rangefinder. Despite these advances, the effectiveness of network models in the planning context remains a challenge in existing policy emulation approaches.
To improve planning-based reasoning, new neural network architectures have recently been investigated in several works. An end-to-end architecture called predictor (predictron) is introduced, whose core is an abstract model represented by the markov reward process, which can deduce a value estimate. Another deep neural network architecture with similar potential motivations is a value iterative network that includes a special module for recursive value iteration. The learner developed a cyclic network as a representation of path integral optimal control to learn cost and dynamic models.
Another practical challenge of mock learning is the mismatch in data distribution between the exemplary and learned strategies. To address this challenge, researchers have proposed an iterative data aggression method and applied it to learning the reactive controllers of micro-aircraft. This approach is further extended to SafeDAgger, which trains a safe strategy to prevent the learning process from falling into a dangerous state, while reducing the frequency of human intervention. Another recent article addresses this problem by gradually filling the allocation gap between demonstration and learned strategies using an adaptive model predictive controller as a demonstrator for adjusting its strategy.
However, the scarcity of catastrophic events in the training data set of these approaches remains a drawback in the safety-to-top application.
Based on the above, the invention provides a simulation learning method for local motion planning and obstacle avoidance of a mobile robot. The main goal is to speed up the local motion planning decisions made by the mobile robot, while making the decisions as optimal, safe and versatile as possible.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
The embodiment of the invention provides a local motion planning method for a mobile robot, which mainly comprises the following steps of:
and step 101, determining a plane (2d) local cost map image.
Optionally, the determining a 2d local cost map image includes:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
Here, the predetermined sensor may be one or a plurality of sensors. For example, the predetermined sensor may be a camera or a video camera on the mobile robot.
And 102, determining the speed of the mobile robot.
As one embodiment, the determining the velocity of the mobile robot includes:
determining first position information of the mobile robot at a first moment;
determining second position information of the mobile robot at a second moment; wherein the first time is a time before the second time;
and determining the speed of the mobile robot according to the first position information, the second position information, the first time and the second time.
For example, the speed of the mobile robot is (second position information — first position information)/(second time-first time).
Of course, the manner of determining the speed of the mobile robot is not limited to the above-listed form, and may be determined by other means. For example, data is directly obtained from a speed sensor of the mobile robot, and the speed of the mobile robot is determined.
And 103, formulating action instructions for the mobile robot through a learning-based planner based on the speed and the 2d local cost map image so as to execute the action instructions by the mobile robot.
Here, the action instruction includes:
the linear velocity and the angular velocity that the mobile robot needs to execute.
In this way, the mobile robot can effectively avoid obstacles by executing the motion command determined in the above manner, and the time required for the motion command determined in the above manner is short.
The learning-based planner is a trained planner and can rapidly and accurately make action instructions for the mobile robot.
Illustratively, the learning-based planner learns a local motion planning strategy by:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.
In the above scheme, one of the determination manners of the 2d local cost map is as follows:
determining a cost map according to equation (2), wherein the cost map is constructed as a sum of local target rewards and obstacle penalties:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map in which the pixel value of a local target point is set to 1, the other pixel values are set to 0, and λ is a prize-related valueA hyperparameter of the excitation coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
In the above scheme, the second determination method of the 2d local cost map is as follows:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
In the above scheme, the determining the model weight parameter θ includes:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
In the above solution, the formulating an action instruction for the mobile robot by the learning-based planner includes:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot, so that the mobile robot can simulate expected behaviors and avoid obstacles in local motion planning.
Here, the attention area is a partial area on the cost map.
Here, the feature extracted from the region of interest, for example, only the feature that coincides with the direction in which the target point is oriented, contributes to quick acquisition of the motion instruction.
In the above scheme, the method further comprises:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator; here, the preset number may be understood as a large number;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
Illustratively, the acquisition sources of training samples for the learning-based planner include:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
Here, the probability range of the predetermined ratio may be understood as: most of the time the learning based planner will encounter such a trajectory.
Optionally, the method for generating the second cost map includes:
firstly, randomly generating a preset number of binary barrier clusters;
and then, carrying out Gaussian fuzzy processing on the binary obstacle clusters, and carrying out probability conversion on the obstacle map.
And finally, randomly drawing the local target points on the obstacle map.
The method for planning the local motion of the mobile robot can efficiently avoid obstacles, quicken the speed of the mobile robot for making a local motion planning decision, and simultaneously make the decision as optimized, safe and universal as possible.
Example two
The embodiment provides a local motion planning device for a mobile robot, which comprises:
a preprocessor 10 for determining a planar 2d local cost map image; determining a velocity of the mobile robot;
a controller 20 for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot.
In the above solution, the mobile robot includes a learning-based planner.
As an embodiment, the preprocessor 10 is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
Specifically, how the learning-based planner learns the local motion planning strategy, how to select the demonstrator, how to obtain the training sample, and other processing manners can be realized by referring to corresponding contents in the mobile robot local motion planning method; and will not be described in detail herein.
As an embodiment, the controller 20 is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ is a vector of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image.
As an embodiment, the preprocessor 10 is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
As an embodiment, the preprocessor 10 is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
As an embodiment, the manner in which the controller 20 determines the model weight parameter θ includes:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
As an embodiment, the controller 20 is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the local cost of the mobile robot at 2d after executing the motion command u at the position sThe location of a pixel in the map;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
As an embodiment, the controller 20 is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
As an embodiment, the controller 20 is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
Those skilled in the art should understand that the functions implemented by each processing module in the mobile robot local motion planning apparatus shown in fig. 2 can be understood by referring to the related description of the mobile robot local motion planning method. Those skilled in the art will appreciate that the functions of each processing unit in the local motion planning apparatus for a mobile robot shown in fig. 2 can be implemented by a program running on a processor, and can also be implemented by a specific logic circuit.
The local motion planning device for the mobile robot can enable the mobile robot to efficiently avoid obstacles, quickly make local motion planning decisions, and simultaneously enable the made decisions to be optimized, safe and universal as much as possible.
EXAMPLE III
Based on the method and the device for planning the local motion of the mobile robot in the first embodiment and the second embodiment, a method for planning the local motion of the mobile robot and avoiding obstacles, which is proposed by deep simulation learning, is given below. The main goal is to speed up the mobile robot making local motion planning decisions, while making the decisions as optimal, safe and versatile as possible.
A. System architecture
Fig. 3 is a block diagram of a local movement planning system with a policy network, and as can be seen from fig. 3, the system mainly includes two large planning blocks, the first planning block is used for preprocessing raw sensing data and generating a local occupancy map describing surrounding obstacles and local target points extracted from a global path according to the posture of the robot. These intermediate results are then fed to a second planning block where we use a deep neural network to simulate a local planning strategy. In addition, we also provide the speed of the robot as a network input to improve the smoothness of the sequential decision. During deployment, the proposed neural network strategy generates action commands by performing feed-forward calculations at each sample time, and is therefore computationally efficient and easy to process for real-time decision-making.
B. Problem formulation and how to learn local motion planning strategies
Given the 2d local cost map image and the mobile robot speed, the local motion planning strategy provides the following action commands:
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is the latest velocity vector of the mobile robot, θ is a model weight parameter, and m is a 2d local cost map image; wherein the cost map may be determined in a variety of ways.
Specifically, the cost map is constructed as the sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the value of the pixel of the local target point is equal to 1, the others are 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
It should be noted that the cost map can also be determined by other ways:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
Note that the cost map can also be learned within the neural network by adding an additional convolutional layer in front of the network. However, in our empirical experiments we observed that the inference precision differed very little. Two possible reasons may be: the cost map calculated from equation (2) does represent the essence of the cost function of the demonstrator, and the cost function is finally learned and adapted to the cost map. In this work, we removed the convolutional layer of cost learning, aiming to reduce model redundancy.
Given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
C. Neural network model, representing f by using the neural network modelθ(mi,u'i)
Inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation of strategy improvement and truncation strategy evaluation;
wherein s and r correspond to the position and cost of the pixel in the cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor, and the value of gamma is 0-1; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s') represents the velocity of the mobile robot at the kth cycle; representing the value of the mobile robot at the position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest, such as a Q value in fig. 4, from the high-level planning features extracted by formula (4);
advanced planning features from the area of interest are fused with the speed of the mobile robot and the fused results are provided to a fully connected layer that can generate smooth continuous motion commands to mimic expected behavior and avoid obstacles in local motion planning.
Here, the region of interest is a partial region on the 2d local cost map.
Here, the feature extracted from the region of interest, for example, only the feature that coincides with the direction in which the target point is oriented, contributes to quick acquisition of the motion instruction.
Fig. 4 shows a schematic diagram of a locally planned deep neural network, in fig. 4, the parameters in the convolutional layer represent filter size, depth and stride. The parameters in the fully connected layers indicate the size of the output unit. In the experiment, the loop number K can be selected as 36, and how to select the loop number K is related to the size of the local cost map image.
D. Demonstrator based on optimization
Since the main goal of the proposed method is to learn the optimal strategy, the natural choice of the demonstrator is an optimization-based planner provided with a large amount of computational resources. In the present invention, we use a set of developed local planners as demonstrator, which tries to minimize the cost at each sampling time, specifically, the demonstrator determines the motion command for the mobile robot according to the formulas (5a), (5b), (5c), wherein the formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter. To simplify the calculation, the control action is assumed to be unchanged within the prediction range.
During the robot navigation, the solution according to equations (5a), (5b), (5c) needs to be repeated at each sampling time. However, the term mobs(xk) And g (x)kU) may involve non-linearities that make the problem difficult to solve effectively. Conventional approaches use a sampling-based solver, which often compromises an approximate sub-optimal solution. At each planning cycle, some potential trajectories within the prediction horizon are first generated and then evaluated according to the associated cost values. The quality of the resulting solution is therefore of great rangeDepending on the number and differences of candidate trajectories considered. For behavioral demonstration purposes, we employ a large sample planner to provide a near-optimal reference strategy.
E. Data acquisition
As previously mentioned, an open challenge of mock learning is a state assignment mismatch between the training data set and the learned strategy. To effectively solve this problem and to cover as completely as possible the observations that may occur, we collected training samples from two sources. The first is a normal experimental trajectory demonstrated by an expert, which is expected to be encountered most of the time by a trained planner. The second is a random cost map generated artificially to feed dangerous situations that the demonstrator rarely encounters. The random cost map generation process is as follows: some binary obstacle clusters are first randomly generated and then gaussian blurred for probability transformation of the obstacle map. Subsequently, the local target points are randomly drawn on the map. Finally, we convert the cost map into a robot coordination system so that the poses of the robots in all data sets are the same, which is considered to be beneficial to sample efficiency.
The invention develops a simulated learning algorithm, realizes the real-time approximate optimal local motion planning, and simultaneously keeps better safety and universality for the application of the mobile robot. Unlike end-to-end modeling, we developed a local planning strategy based on a preprocessed 2d local cost map as input. The local cost map may be constructed from local target points and obstacle maps containing multiple frames of information received from sensor devices such as lidar, sonar and depth cameras. At each sample time, our local motion planning model embedded in the value iterative network produces an action instruction through feed forward reasoning, which is computationally efficient and enables planning-based reasoning. To train a robust model, we use a combination of real-world local obstacle maps and randomly generated artificial maps collected from demonstration experiments, which not only speeds up the data collection process, but also complements dangerous observation samples that are rarely encountered in demonstration. A brief comparison of the methods presented herein and existing learning methods is summarized in table 1.
Table 1: comparison of local movement planning methods between local 2d mimic learning, existing end-to-end mimic learning, and traditional optimization-based methods.
The main contributions of the present application include at least:
1. the local motion planning is expressed as a simulated learning problem based on the pre-processed 2d local cost map image. This form of learning may incorporate Value Iteration Networks (VINs) and is common to a wide variety of sensor devices.
2. By randomly generating an artificial local cost map, the problem of a shortage of training data, particularly a scarce but dangerous sample of events, is overcome.
In order to better illustrate the effectiveness and feasibility of the local motion planning method for the mobile robot, the local motion planning method for the mobile robot can be applied to a robot platform for experiments.
In the following, the experiments and evaluation of the local planner are described in detail.
A. Equipment
First, the robot platform used in the experiment was a delivery robot (Loomo Go), a Segway delivery robot (Segway delivery robot) equipped with an Intel real sensor (Intel real sensor), an ultrasonic sensor, and a wheel encoder. The depth camera in RealSense is used to maintain a fixed size 2.8m of the partial occupancy map with a resolution of 0.1 m.
Second, for model training, we have collected over 600k tuples in total, half from the exemplary trajectory and half from random generation. The data set was divided into a training set (80%) and a test set (20%). Here, the ratio of the training set to the test set may be adjusted according to actual requirements. The exemplary uses an optimization-based planner that samples 11 grid points for linear velocities in the range of 0.0m/s, 0.5m/s and 81 grid points for angular velocities in the range of-0.8 rad/s, 0.8 rad/s. The neural network model was implemented using a deep learning system (Tensorflow) framework and trained ab initio on an Ingland-Tatan (Nvidia Titan X) for about 8 hours with an Adam optimizer; wherein Adam is an algorithm for optimizing a random objective function based on a first order gradient; evaluation and deployment were performed on a laptop (laptop) using an Intel i7-6700HQ CPU, Ubuntu 14.04 operating system. FIG. 5 shows a schematic diagram of a SageWis delivery robot in FIG. 5 equipped with Intel Realsense ZR300(30Hz RGB-depth, FishEye and IMU), Intel Atom Z8750(4 cores, 2.4GHz) and 4GB memory; wherein, IMU's English is called completely Inertial measurement unit, and Chinese meaning is Inertial measurement unit.
B. Model index
We first evaluate the trained model frame by comparing the prediction accuracy in the training dataset and the test dataset and the performance of the proposed learning-based planner against the optimization-based planner.
1) Training and testing accuracy: FIG. 6 shows a block diagram of the absolute prediction error of a trained planner versus an exemplar. The errors of the linear and angular velocities are measured separately because their influence on the planning result is different. In fig. 6, the lower and upper limits of the box represent the first quartile and the third quartile, respectively. The line in the box represents the median and the points marked with "+" outside the horizontal line represent outliers. As can be seen from fig. 6, the first and third, quartiles of the prediction error on the training data set and the test data set are very small, and the accuracy of training and testing of a trained planner is significantly higher than that of existing optimization-based planners. Here, the well-trained planner is obtained by using the local motion planning method for the mobile robot according to the present invention, and is also a planner based on learning.
Table 2 summarizes detailed indicators of the planning accuracy. It can be noted that the average error of linear and angular velocities on the test data set is as good as the training data set, indicating that the learned model has a strong generalization capability in the local planning task. We can also observe that the standard deviation of the errors on the test data set is higher than the training data set, indicating that some significant predictive outliers still exist.
Training-v [ m/s] | Training-w [ rad/s] | Test-v [ m/s] | Test-w [ rad/s] | |
Average | 0.0031 | 0.0115 | 0.0037 | 0.0151 |
Standard of merit | 0.0050 | 0.0132 | 0.0079 | 0.0308 |
Table 2: statistics of prediction errors
In addition to the average performance, we also investigated the case where large prediction errors occurred. Figure 7 shows the action instructions provided by the trained planner and the demonstrator in three cases within different error ranges of the test data set. In fig. 7, the left error is normal (0.0024), the middle error is large (0.1477), and the right error is large (0.5733). The robot is located at the origin where 2d occupies the north plane in the figure. Isolated gray cells are targets, while darkness of other cells represents the obstruction probability. The length of the line represents the linear velocity and the direction represents the angular velocity.
In the left case, a trained planner predicts an action that tends to stay some distance away from the obstacle clusters on both sides, which almost overlaps the decision provided by the demonstrator. In the middle, the trained planner is slightly different from the demonstrator, which may be due to ambiguities introduced by target points hidden in the obstacle clusters. When the local target lags behind the robot, as shown in the right case, it is difficult for a well-trained planner to produce exactly the same action as the demonstrator. However, the decision of a trained planner is still considered reasonable, as it turns to the correct direction with a smoother speed of change.
2) Comparison with optimization-based planner: one key motivation for the learning-based planner presented by the present invention is to reduce the computation time, which is a disadvantage of the traditional optimization-based approach. Thus, we compare the performance of a trained planner to an optimization-based planner in a complex planning environment. As described in D below, the number of candidate trajectories considered has a large impact on computation time and solution quality. In the following evaluations, we employed the demonstrator with 11 × 81 linear and angular velocity samples as the baseline planner, and evaluated the performance of the learning-based planner and the optimization-based planner with different numbers of samples (5 × 11,7 × 31,8 × 51,9 × 61,10 × 71). The best gap for each decision is defined asWhere v and w are the linear and angular velocities obtained from the evaluated planner,andfrom a baseline reference.
Fig. 8 shows a schematic comparison of the optimal gap between the learning-based planner and the optimization-based planner. As can be taken from fig. 8, the optimality error of the learning-based planner is approximately equal to the optimization-based planner, with 750 sample trajectories.
Fig. 9 shows a schematic diagram of a comparison of computation times between a learning-based planner and an optimization-based planner. As can be taken from fig. 9, the computation time of the learning-based planner is approximately equal to the optimization-based planner, with 160 sample trajectories.
As shown in fig. 8 and 9, for the optimization-based planner, although the average optimality gap decreases as the number of samples increases, the computation time increases almost linearly. In contrast, the proposed learning-based planner provides a highly competitive quality solution at around 22ms, significantly faster than the optimization-based approach. Note that better performance of the learning-based planner does not imply longer computation times. It is related to the quality of the demonstration behavior and training data set. If we solve equation (5a) sufficiently to be optimal, using it as a demonstrator, the quality of the trained strategy is expected to improve further.
C. Navigation in a simulated environment
Based on accurate frame-by-frame motion prediction, in this section, we evaluated the performance of a trained planner in navigation simulations. To check the ability to avoid obstacles, a global reference path for robot navigation is set to approach or cross the obstacle. At each sample time, a trained planner receives the local cost map and returns a velocity vector, which causes the robot to be driven to a new state according to a kinematic model. The resulting trace is shown in fig. 10. In fig. 10, the dotted line represents the global reference path, and the solid line and the dotted line correspond to the trajectories of the learning-based planner and the optimization-based demonstrator, respectively. As can be seen from fig. 10, the trajectory of the trained planner successfully avoids obstacles when the global reference values are close, while smoothly following the reference in open space. Furthermore, the trajectory of the trained planner is nearly identical to the behavior of the demonstrator, which illustrates the high quality of the trained planner in simulating the near-optimal demonstrator.
D. Navigation in the real world
Finally, we deploy a trained model into real-world navigation experiments. Given a global reference path, the task of the segway delivery robot is to follow the reference path, avoiding obstacles on the road. Two aspects are noted: response to unexpected obstacles on the reference path, and robustness to long-term operation.
1) Reaction to obstacles on the global path: figure 11 shows a schematic diagram of the response of a trained planner to an unexpected obstacle on a reference path in a real world experiment. In fig. 11, the upper part shows the local obstacle map fed to the planner. The lower half is the corresponding view captured from the robot front camera. The robot is located at a north-facing origin on the 2d local map. The length of the line represents the linear velocity and the direction represents the angular velocity.
As shown in fig. 11, the robot makes local planning decisions based on the local occupancy map constructed online. Although this map is relatively small in scale, this map provides rich ambient information. When an obstacle is encountered that obstructs the global trajectory, the trained planner successfully provides the action instructions that drive the robot to the open area.
2) And (3) long-term operation: long-term experiments were performed in narrow corridors filled with obstacles. The task of the robot is to recursively follow the global reference path without encountering any obstacles. FIG. 12 shows a schematic of navigation trajectories from two planners in a real-world environment, one being a hybrid training based on data of the exemplary trajectory and an artificially generated local cost map, and the other being a single training only with normal data collected from the exemplary experiment. In fig. 12, 2d occupies darkness of the graph in proportion to the obstacle probability; and the task of the robot is to navigate recursively in the office corridor with a trained planner without hitting an obstacle.
As can be seen from fig. 12, the robot operated with the latter planner touches several obstacles in a crowded area, and must be manually intervened for safety. In contrast, a planner trained with a mixed data set successfully drives the robot around obstacles and roams in corridors in a robust and smooth manner.
In conclusion, the invention provides a 2d local cost map based on preprocessing, and a local motion planning system with the capability of avoiding obstacles is established through deep simulation learning. A value iterative network is embedded, and a developed local planner has strong competitiveness in the aspects of decision quality, calculation time and robustness through mixed training of actual and artificial cost map images.
Future work may be extended in the following two ways. First, the distribution of the data set and the prediction error should be further studied in order to generate in a more efficient and easy-to-process manner an artificial cost map for feeding unowned context. Second, in addition to inferences about local planning strategies, we also consider extending the proposed model with additional outputs to predict uncertainty of action commands, which would be a valuable bonus for safe-to-up real-world operations.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (17)
1. A method for planning local motion of a mobile robot, the method comprising:
determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a 2d local cost map image according to the local target point and the local obstacle map;
determining a velocity of the mobile robot;
formulating, by a learning-based planner, action instructions for the mobile robot based on the velocity and the 2d local cost map image for execution by the mobile robot;
wherein formulating action instructions for the mobile robot through a learning-based planner comprises:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, m is a 2d local cost map image, and f is a learning-based planner model.
2. The method of claim 1, wherein the determining a 2d local cost map image comprises:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
3. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
4. The method of claim 1 or 2, wherein determining the 2d local cost map comprises:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
5. The method of claim 1 or 2, wherein determining the model weight parameter θ comprises:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
6. The method of claim 1 or 2, wherein said formulating action instructions for the mobile robot by the learning-based planner comprises:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
7. The method of claim 1 or 2, wherein the method further comprises:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
8. The method of claim 7, wherein the acquisition sources of training samples for the learning-based planner comprise:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating for small probability events that the demonstrator overlooked the learning-based planner will encounter.
9. A mobile robot local motion planning apparatus, the apparatus comprising:
the preprocessor is used for determining a local target point and a local obstacle map according to a given global path and the surrounding environment where the mobile robot is located, and determining a plane 2d local cost map image according to the local target point and the local obstacle map; determining a velocity of the mobile robot;
a controller for formulating action instructions for the mobile robot by a learning-based planner based on the velocity and the 2d local cost map image for execution by the mobile robot;
wherein, the controller is specifically configured to:
given the 2d local cost map image and the mobile robot speed, the following action commands are provided according to equation (1):
u=fθ(m,u') (1)
where u ═ v, w are vectors of linear velocity v and angular velocity w to be executed, u' is a velocity vector of the mobile robot, θ is a model weight parameter, m is a 2d local cost map image, and f is a learning-based planner model.
10. The apparatus of claim 9, wherein the preprocessor is specifically configured to:
acquiring data acquired by a preset sensor on the mobile robot;
positioning the mobile robot based on the data, and simultaneously establishing a surrounding environment map where the mobile robot is located;
determining a local target point and a local obstacle map according to a given global path and the surrounding environment map;
and determining a 2d local cost map image according to the local target point and the local obstacle map.
11. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:
determining a 2d local cost map according to equation (2), wherein the 2d local cost map is constructed as a sum of local target reward and obstacle penalty:
m=λmgoal-mobs (2)
wherein m isobsLocal obstacle map being the probability of an obstacle, mgoalIs a binary target map, wherein, in the binary target map, the pixel value of the local target point is set as 1, the other pixel values are set as 0, and lambda is a hyper-parameter related to the reward coefficient; and if the nearest effective target point on the reference path is positioned outside the 2d local cost map window, replacing the effective target point with a projection point from the effective target point to a map frame.
12. The apparatus according to claim 9 or 10, wherein the preprocessor is specifically configured to:
inputting a given 2d local cost map image and a given mobile robot speed into a preset neural network model, and obtaining a 2d local cost map through learning of the preset neural network model; wherein, the preset neural network model is a neural network model omitting convolution layers.
13. The apparatus of claim 9 or 10, wherein the controller determines the model weight parameter θ in a manner comprising:
given a set of exemplary action instructionsThe planner trains the required local motion planning strategy based on the error minimization criterion:
wherein the content of the first and second substances,is an exemplary tuple, J (θ) represents an error function, is an exemplary action instructionAnd the actual action command fθ(mi,u'i) A cumulative sum of squares of absolute values of the differences;
once training is complete, the model weight parameter θ is fixed and entered into equation (1) during deployment, such that equation (1) calculates an action command based on the model weight parameter θ.
14. The apparatus of claim 9 or 10, wherein the controller is specifically configured to:
inputting a 2d local cost map image to a deep neural network embedded with a value iteration module, wherein the value iteration module extracts high-level planning features through recursive operation;
wherein s and r correspond to the position and cost of the pixel in the 2d local cost map, respectively, and p (s', r/s, u) represents the transition probability; k represents the number of cycles; gamma represents an attenuation factor; v. ofk+1(s) represents the value at position s obtained after k +1 iterations of the mobile robot; v. ofk(s ') represents the value of the mobile robot at position s' after k iterations; u represents an action command executed by the mobile robot; s' represents the position of a pixel in the 2d local cost map after the mobile robot executes the action command u at the position s;
determining a high-level planning feature from the region of interest from the high-level planning features extracted by formula (4);
and fusing the high-level planning features from the attention area with the speed of the mobile robot, and providing the fusion result to a complete connection layer which can generate smooth continuous action commands in a deep neural network so as to obtain action instructions formulated for the mobile robot.
15. The apparatus of claim 9 or 10, wherein the controller is further configured to:
selecting an optimization-based planner that can provide a preset number of computing resources as a demonstrator;
the demonstrator determines a motion command for the mobile robot according to formulas (5a), (5b) and (5c), wherein formulas (5b) and (5c) are constraint conditions;
xk+1=h(xk,u),k=0,...,N-1 (5b)
where N is the length of the prediction horizon, xkIs the 2d pose, d, of the mobile robot at time step kgIs the distance, alpha, between the mobile robot and the local target point at a time step NgIs the absolute angle between the mobile robot's orientation and the direction from the mobile robot's position to the local target point at time step N, h (x)kU) is a kinematic model of the robot,is the maximum probability of obstruction, w, of access being allowed1,w2,w3Is a cost weight parameter.
16. The apparatus of claim 15, wherein the controller is further configured to:
generating a first cost map according to a normal experiment track of the demonstrator; wherein the normal experimental trajectory of the demonstrator is a trajectory expected to be encountered by the learning-based planner within a predetermined proportional range of probability;
a second cost map generated by human intervention, containing unexpected dangerous situations of the demonstrator, for compensating the small probability events which will be encountered by the learning-based planner and ignored by the demonstrator;
and using the first cost map and the second cost map as training samples of the learning-based planner.
17. A computer storage medium having computer-executable instructions stored thereon for performing the method of local motion planning for a mobile robot of any of claims 1 to 8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710987041.8A CN107861508B (en) | 2017-10-20 | 2017-10-20 | Local motion planning method and device for mobile robot |
PCT/CN2018/087326 WO2019076044A1 (en) | 2017-10-20 | 2018-05-17 | Mobile robot local motion planning method and apparatus and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710987041.8A CN107861508B (en) | 2017-10-20 | 2017-10-20 | Local motion planning method and device for mobile robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107861508A CN107861508A (en) | 2018-03-30 |
CN107861508B true CN107861508B (en) | 2021-04-20 |
Family
ID=61697686
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710987041.8A Active CN107861508B (en) | 2017-10-20 | 2017-10-20 | Local motion planning method and device for mobile robot |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107861508B (en) |
WO (1) | WO2019076044A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107861508B (en) * | 2017-10-20 | 2021-04-20 | 纳恩博(北京)科技有限公司 | Local motion planning method and device for mobile robot |
CN110858328B (en) * | 2018-08-06 | 2022-06-14 | 纳恩博(北京)科技有限公司 | Data acquisition method and device for simulating learning and storage medium |
CN109358618B (en) * | 2018-09-28 | 2021-10-26 | 安徽工程大学 | Path planning method of mobile robot |
CN110046457B (en) * | 2019-04-26 | 2021-02-05 | 百度在线网络技术(北京)有限公司 | Human body model control method and device, electronic equipment and storage medium |
CN111912407B (en) * | 2019-05-08 | 2022-05-17 | 胡贤良 | Path planning method of multi-robot system |
CN110285813B (en) * | 2019-07-01 | 2022-11-25 | 东南大学 | Man-machine co-fusion navigation device and method for indoor mobile robot |
CN110503065B (en) * | 2019-08-28 | 2023-04-07 | 南京大学 | Mobile equipment user action gesture recognition method based on distance measurement |
JP7221839B2 (en) * | 2019-10-08 | 2023-02-14 | 国立大学法人静岡大学 | Autonomous Mobile Robot and Control Program for Autonomous Mobile Robot |
CN112783147A (en) * | 2019-11-11 | 2021-05-11 | 科沃斯机器人股份有限公司 | Trajectory planning method and device, robot and storage medium |
CN111079603A (en) * | 2019-12-06 | 2020-04-28 | 青岛歌尔智能传感器有限公司 | Step prediction method, controller, positioning device and readable storage medium |
CN111288995B (en) * | 2020-03-12 | 2022-05-13 | 深圳市人工智能与机器人研究院 | Route planning method and route planning device of mobile robot and terminal equipment |
CN111739099B (en) * | 2020-07-20 | 2020-12-11 | 北京云迹科技有限公司 | Falling prevention method and device and electronic equipment |
WO2022027199A1 (en) * | 2020-08-03 | 2022-02-10 | 深圳市大疆创新科技有限公司 | Control method for movable platform, movable platform and storage medium |
CN113741480B (en) * | 2021-09-16 | 2024-06-28 | 中科南京软件技术研究院 | Obstacle avoidance method based on combination of dynamic obstacle extraction and cost map |
CN113805483B (en) * | 2021-09-17 | 2022-07-12 | 中国人民解放军国防科技大学 | Robot control method and device based on model prediction and computer equipment |
CN114237242B (en) * | 2021-12-14 | 2024-02-23 | 北京云迹科技股份有限公司 | Method and device for controlling robot based on optical encoder |
CN114355923B (en) * | 2021-12-28 | 2024-04-02 | 杭州电子科技大学 | MPC-based track planning and tracking method under A-guidance |
CN115167434B (en) * | 2022-07-21 | 2024-06-28 | 清华大学深圳国际研究生院 | Local navigation obstacle avoidance method and robot |
CN115421494A (en) * | 2022-09-19 | 2022-12-02 | 西安交通大学 | Cleaning robot path planning method, system, computer device and storage medium |
CN115542901B (en) * | 2022-09-21 | 2024-06-07 | 北京航空航天大学 | Deformable robot obstacle avoidance method based on near-end strategy training |
CN116911176B (en) * | 2023-07-08 | 2024-04-30 | 哈尔滨理工大学 | Terrain trafficability prediction method based on speed and vibration state of wheeled mobile robot |
CN117232531B (en) * | 2023-11-14 | 2024-01-30 | 长沙小钴科技有限公司 | Robot navigation planning method, storage medium and terminal equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130112507A (en) * | 2012-04-04 | 2013-10-14 | 인하대학교 산학협력단 | Safe path planning method of a mobile robot using s× algorithm |
CN106325275A (en) * | 2016-09-14 | 2017-01-11 | 广州今甲智能科技有限公司 | Robot navigation system, robot navigation method and robot navigation device |
CN106774347A (en) * | 2017-02-24 | 2017-05-31 | 安科智慧城市技术(中国)有限公司 | Robot path planning method, device and robot under indoor dynamic environment |
CN106774327A (en) * | 2016-12-23 | 2017-05-31 | 中新智擎有限公司 | A kind of robot path planning method and device |
WO2017095591A1 (en) * | 2015-12-02 | 2017-06-08 | Qualcomm Incorporated | Simultaneous mapping and planning by a robot |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573140A (en) * | 2013-10-09 | 2015-04-29 | 北京军区军事训练模拟仿真研发服务中心 | Layered dynamic path planning method applied to virtual simulation |
CN105629974B (en) * | 2016-02-04 | 2018-12-04 | 重庆大学 | A kind of robot path planning method and system based on modified Artificial Potential Field Method |
CN105955280A (en) * | 2016-07-19 | 2016-09-21 | Tcl集团股份有限公司 | Mobile robot path planning and obstacle avoidance method and system |
CN107861508B (en) * | 2017-10-20 | 2021-04-20 | 纳恩博(北京)科技有限公司 | Local motion planning method and device for mobile robot |
-
2017
- 2017-10-20 CN CN201710987041.8A patent/CN107861508B/en active Active
-
2018
- 2018-05-17 WO PCT/CN2018/087326 patent/WO2019076044A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130112507A (en) * | 2012-04-04 | 2013-10-14 | 인하대학교 산학협력단 | Safe path planning method of a mobile robot using s× algorithm |
WO2017095591A1 (en) * | 2015-12-02 | 2017-06-08 | Qualcomm Incorporated | Simultaneous mapping and planning by a robot |
CN106325275A (en) * | 2016-09-14 | 2017-01-11 | 广州今甲智能科技有限公司 | Robot navigation system, robot navigation method and robot navigation device |
CN106774327A (en) * | 2016-12-23 | 2017-05-31 | 中新智擎有限公司 | A kind of robot path planning method and device |
CN106774347A (en) * | 2017-02-24 | 2017-05-31 | 安科智慧城市技术(中国)有限公司 | Robot path planning method, device and robot under indoor dynamic environment |
Also Published As
Publication number | Publication date |
---|---|
CN107861508A (en) | 2018-03-30 |
WO2019076044A1 (en) | 2019-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107861508B (en) | Local motion planning method and device for mobile robot | |
Chen et al. | Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning | |
Chen et al. | Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety | |
Dai et al. | Fast frontier-based information-driven autonomous exploration with an mav | |
EP3788549B1 (en) | Stacked convolutional long short-term memory for model-free reinforcement learning | |
Min et al. | RNN-based path prediction of obstacle vehicles with deep ensemble | |
Grigorescu et al. | Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles | |
CN114846425A (en) | Prediction and planning of mobile robots | |
Zhu et al. | Off-road autonomous vehicles traversability analysis and trajectory planning based on deep inverse reinforcement learning | |
Liu et al. | Map-based deep imitation learning for obstacle avoidance | |
CN103901891A (en) | Dynamic particle tree SLAM algorithm based on hierarchical structure | |
Paz et al. | Tridentnet: A conditional generative model for dynamic trajectory generation | |
Wiedemann et al. | Robotic information gathering with reinforcement learning assisted by domain knowledge: An application to gas source localization | |
Agarwal et al. | Imitative planning using conditional normalizing flow | |
Ramezani et al. | UAV path planning employing MPC-reinforcement learning method considering collision avoidance | |
Kim et al. | Learning forward dynamics model and informed trajectory sampler for safe quadruped navigation | |
Jacinto et al. | Navigation of autonomous vehicles using reinforcement learning with generalized advantage estimation | |
WO2023242223A1 (en) | Motion prediction for mobile agents | |
Rezaei et al. | A deep learning-based approach for vehicle motion prediction in autonomous driving | |
EP3839830A1 (en) | Trajectory estimation for vehicles | |
Artuñedo et al. | Machine learning based motion planning approach for intelligent vehicles | |
Kvasnikov et al. | Designing a computerized information processing system to build a movement trajectory of an unmanned aircraft vehicle | |
Xue et al. | Combining Motion Planner and Deep Reinforcement Learning for UAV Navigation in Unknown Environment | |
Daryina et al. | Unmanned vehicle’s control real-time method based on neural network and selection function | |
Natan et al. | DeepIPC: Deeply integrated perception and control for an autonomous vehicle in real environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |