CN109409592A

CN109409592A - The optimal policy solution of mobile robot under dynamic environment

Info

Publication number: CN109409592A
Application number: CN201811196536.XA
Authority: CN
Inventors: 欧林林; 范振雍; 禹鑫燚; 陆文祥
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2019-03-01
Anticipated expiration: 2038-10-15
Also published as: CN109409592B

Abstract

The optimal policy solution of mobile robot includes the following steps: first under dynamic environment, according to the running environment of robot, construct improvement-weighting switching system, according to mission requirements, using linear time temporal logic (LTL) by mission requirements mathematical expression, B ü chi automatic machine is converted by LTL task formula using LTL2BA kit；Then cartesian product is carried out by 2, obtains Product automatic machine, contains mission requirements and environmental information；Useless point removal on feasibility network topological diagram is further judged into the availability of state point further according to double labels and behavior restraint criterion, and then simplifies the quantity of state point.Remaining point is built into MDP model, the method for Utilization strategies iteration obtains optimal policy.The present invention not only solves the case where there is no DRA, also makes the reduction of available point quantity, and the MDP complexity decline of building can faster obtain optimal policy.

Description

The optimal policy solution of mobile robot under dynamic environment

Technical field

The present invention relates to the optimal policy generation methods of mobile robot under dynamic environment.

Background technique

It is in recent years, with the development of science and technology, increasing for the demand of intelligent robot in people's production, life, Requirement to robot automtion level is also higher and higher.The application of intelligent robot necessarily involves the movement of robot, i.e., The path planning of robot, existing paths planning method such as genetic algorithm, Particle Swarm Optimization, ant colony optimization algorithm, simulation Annealing algorithm is all the optimal path cooked up in static environment according to given robot running environment, and for path Search be all single step determine in the case where.And for such as artificial neural network algorithm, heuristic search algorithm, based on adopting The path planning algorithm etc. of sample, although the environment of dynamic change can be suitable for, for there are many complex tasks and single step In the case where selection, task can not be still completed well.Based on linear time temporal logic (linear temporal logic, LTL) theoretical method for planning path for mobile robot describes task complicated in practical application using linear temporal task formula Demand, and environmental information and mission bit stream are blended to guarantee to search out and not only meet environmental information, but also meet task need The optimal path asked.But for single step there are many selection the case where, required is not optimal path, but can satisfy appoint The optimal policy of business demand.

In order to solve the problem above-mentioned, traditional solution is LTL-DRA (deterministic automation) method, is believed with environment Breath and DRA are combined, and can guarantee that the optimal policy obtained can complete given mission requirements and make search cost compared to upper The dynamic programming algorithm stated is smaller.But using DRA, there are a kind of drawbacks, and in certain situations, LTL formula can not be converted into DRA, this makes traditional solution that can not solve all situations, and on the other hand, there is also some by the MDP that conventional method obtains The state of redundancy can further be reduced.

NBA (non-deterministic automata) is proposed to solve situation present in DRA, and NBA can guarantee each A LTL formula can be converted into automatic machine figure, be convenient for subsequent operation.The case where there are multiple choices for single step, will Model construction solves mahalanobis distance map process at MDP (Markovian decision model), solves optimal policy using Policy iteration.

Summary of the invention

The invention solves the above problem of the prior art, a kind of optimal policy of mobile robot under dynamic environment is provided Solution.

The present invention describes complex task demand using linear time temporal logic (LTL), replaces traditional DRA to turn LTL with NBA It changes graphical representation into, while using double labels and behavior restraint criterion, removing extra idle state, simplifying the solution of MDP problem Certainly.The invention flow chart constructs improvement-weighting switching system as shown in Figure 1, firstly, according to the running environment of robot, according to Mission requirements, using linear time temporal logic (LTL) by mission requirements mathematical expression, using LTL2BA kit by LTL task Formula is converted into B ü chi automatic machine；Then cartesian product is carried out by 2, obtains Product automatic machine, containing task needs Summation environmental information；By the useless point removal (some points are only inputted or only exported) on feasibility network topological diagram, then root According to double labels and behavior restraint criterion, the availability of state point is further judged, and then simplify the quantity of state point.It will be remaining Point is built into MDP model, and the method for Utilization strategies iteration obtains optimal policy.This method not only solves the feelings there is no DRA Condition, also makes the reduction of available point quantity, and the MDP complexity decline of building can faster obtain optimal policy.

The optimal policy solution of mobile robot under dynamic environment of the invention, the specific steps are as follows:

Step 1 constructs improvement-weighting switching system；

It is an improvement-weighting switching system by the environment construction where robot, weighting switching system is to environment Modelling, is defined as tuple T:=(Q, a q₀,R,Π,L,w_T), wherein Q is a limited state set, in environment The node chosen is as state set；q₀∈ Q represents original state, i.e. original state where robot, runs starting point；R →2^QHandoff relation is represented, the connected relation (between path point) is shown between each state；Π represents atomic proposition, i.e., The movement that each state point should be completed；L:Q→2^ΠRepresent mark collection of functions；w_TSwitching weight is represented, as measurement Value, i.e. another label.Effect of the atomic proposition in weighting switching system is to represent the attribute of each state, and if only if When atomic proposition π is true at state q, π ∈ L (q) is just set up, if q₂∈R(q₁), then q₂For q₁Succeeding state；Weighting switching system Any one track r in system_TIt is made of limited state in T, i.e. r_T=q₀q₁q₂..., wherein for arbitrary i >=0 There is q_i+1∈δ(q_i) set up, track r_TContain limited mark function o=o₁o₂o₃..., wherein o_i∈L(q_i).Such as Fig. 2 institute Show, is the MDP process an of robot, it is built into weighting switching system, as shown in figure 3, in q₀Execute the dynamic of pickup Make, in q₉Place executes dropoff movement.

Step 2 complex task mathematical expression；

Complex task can be subjected to mathematical expression according to linear time temporal logic theory；Linear time temporal logic (LTL) is A kind of high-level language close to natural language, by sequential logic operator G (always), F (final), X (following), U (until) and Boolean operator(non-), ∧ (with), ∨ (or), → (implication),(being equivalent to), which combines, can accurately describe moving machine The complex task of device people.Such as task formula

This Task expression robot is after pickup, it is necessary to which pickup can just be returned to after dropoff by reaching, together Reason, dropoff can just be returned to robot dropoff by having to pass through pickup later.

Step 3 generates B ü chi automatic machine；

In order to combine environmental information and mission bit stream, need linear temporal task formula through LTL2BA kit φ is converted to the form of task feasibility chart, i.e. B ü chi automatic machine, converts B ü chi automatic machine for the formula of step 3, such as Shown in Fig. 4.B ü chi automatic machine is a five-tuple B:=(S_B,S_B0,Σ_B,δ_B,F_B).Wherein, S_BRepresent a limited state Collection；S_B0∈S_BRepresent original state；Σ_BRepresent the character list of input；δ_B∈S_B×Σ_B×S_BRepresent switching function；F_B ∈S_BRepresent set of final state.

Step 4 constructs task feasibility network topological diagram；

Switching system will be weighted and B ü chi automatic machine carries out cartesian product, obtained comprising environmental information and mission bit stream Task feasibility network topological diagram P, i.e.,P is a tuple (S_P,S_P0,δ_P,w_P,F_P), wherein S_P=Q × S_BGeneration Table finite state collection；Switching function is represented, is defined as and if only if q_j∈R(q_i) and s_l∈δ_B(s_k,L (q_i)) when, (q_j,s_l)∈δ_P((q_i,s_k)) set up；w_PFor the weight for being inherited from T, i.e., as (q_j,s_l)∈δ_P((q_j,s_l)) when, then w_P((q_i,s_k),(q_j,s_l))=w_T(q_i,q_j)；F_P=Q × F_BRepresent a final reception state.In task feasibility network Useful point is selected on topological diagram to construct MDP, the decision strategy that can guarantee so meets environmental information and meets again Mission requirements.

Step 5 state point is deleted；

On the task feasibility network topological diagram that step 4 obtains, some useless points are taken the lead in rejecting, i.e., are not reached Point has some only inputs or only exports, and such point is not accessibility, because these points is selected to will lead to strategy It interrupts, is unable to get optimal result.Double labels are introduced, a label is state tag, i.e., before turning point, the shape of this state State label value will be consistent with Last status label, such as the state at P1 is pickup, and turning point is in P10, then P2- The state of P9 is all pickup, and similarly after P10, before P1, the state of P9-P2 is all dropoff.And different states it Between cannot be connected, that is, a state can only possess a state tag.Another label is metric, selects distance, i.e., The distance of each state and other state, the thought of behavior restraint criterion is from legal restraint in practice, when under robot One state makes metric lower than this state, then NextState will be rejected, if robot is in q1 point task, then machine People will be added robot and go to q4 toward movement at p10, and there are two states, and q5 may be selected in next step, and the metric of q6, q4 are 3, Exactly only have 3 step pitches from and q5 there are 4 step pitches from q6 there are 2 step pitches from then will give up from target point from target point from target point Q5 state, and select q6 state.Prepared by double labels and behavior restraint, is further simplified task feasibility topology.Such as Fig. 5 institute Show, removes down state point by double labels and behavior restraint criterion, remaining point is to construct MDP.

Step 6 constructs Markovian decision model；

By remaining state point construct MDP, quintet M:=(T, S, A (i), p (and | i, a), r (i, a)) are known as one Mahalanobis distance map process (MDP) wherein the time point for choosing action is referred to as the decision moment, and remembers the point at all decision moment with T Collection；S is limited state space collection；Available action collection A (i) at state i is known as actionable space；P (| i, it is a) referred to as next The probability distribution of etching system state in which when decision；R (i, a) remuneration obtained for policymaker；MDP is completed in building, uses plan Slightly iterative algorithm is solved, and optimal policy is obtained.

The present invention describes the complicated mission requirements in practical application using linear time temporal logic formula, and by mission bit stream It blends to obtain feasibility Task Network topological diagram with environmental information, the strategy made is able to satisfy the demand of task, more efficient Execution task.The invention is innovated on traditional LTL-DRA (DRA: deterministic automation), proposes LTL-DBA method, It avoids when mission requirements are to eventually arrive at P: a φ=GFP of point always, because causing not obtaining using conventional method The case where to optimal policy.The Utilization strategies Searching efficiency feature directly proportional to environment complexity and task node number, simultaneously It is proposed double label models, using different conditions be not attached to behavior restraint criterion, obtain more terse feasibility Task Network Remaining the feasible stage is constructed MDP, while Utilization strategies iterative algorithm, obtains optimal policy by topological diagram.

The invention has the advantages that comparing traditional LTL-DRA, there is more preferable wider applicability, good can obtain most Dominant strategy.

Detailed description of the invention:

Fig. 1 is LTL-MDP strategy generating figure of the invention.

Fig. 2 is Markovian decision model of the invention.

Fig. 3 is weighting switching system T of the invention.

Fig. 4 is the corresponding B ü chi automatic machine of equation φ of the invention.

Fig. 5 is network topological diagram of the invention.

Specific embodiment

LTL-MDP solution of the invention is described further by simplified example below in conjunction with attached drawing.

As shown in Figure 1, firstly, according to running environment Fig. 2 of robot, building improvement-weighting switches the invention flow chart System diagram 3, according to mission requirements: robot is after pickup, it is necessary to which pickup can just be returned to later by reaching dropoff, together Reason, dropoff can just be returned to by having to pass through pickup after robot dropoff, using linear time temporal logic (LTL) by task Demand mathematical expression converts B ü chi automatic machine for LTL task formula using LTL2BA kit；Then flute is carried out by 2 Karr product obtains Product automatic machine, contains mission requirements and environmental information；By the nothing on feasibility network topological diagram With a removal (some points are only inputted or only exported), further according to double labels and behavior restraint criterion, state is further judged The availability of point, and then simplify the quantity of state point.Remaining point is built into MDP model, the method for Utilization strategies iteration obtains Optimal policy out.This method not only solves the case where there is no DRA, also makes the reduction of available point quantity, and the MDP of building is multiple Miscellaneous degree decline, can faster obtain optimal policy.Specific step is as follows:

Step 1 constructs improvement-weighting switching system；

It is an improvement-weighting switching system by the environment construction where robot, weighting switching system is to environment Modelling, is defined as tuple T:=(Q, a q₀,R,Π,L,w_T), wherein Q is a limited state set, in environment The node chosen is as state set；q₀∈ Q represents original state, i.e. original state where robot, runs starting point；R →2^QHandoff relation is represented, the connected relation (between path point) is shown between each state；Π represents atomic proposition, i.e., The movement that each state point should be completed；L:Q→2^ΠRepresent mark collection of functions；w_TSwitching weight is represented, as measurement Value, i.e. another label.Effect of the atomic proposition in weighting switching system is to represent the attribute of each state, and if only if When atomic proposition π is true at state q, π ∈ L (q) is just set up, if q₂∈R(q₁), then q₂For q₁Succeeding state；Weighting switching system Any one track r in system_TIt is made of limited state in T, i.e. r_T=q₀q₁q₂..., wherein for arbitrary i >=0 There is q_i+1∈δ(q_i) set up, track r_TContain limited mark function o=o₁o₂o₃..., wherein o_i∈L(q_i).Such as Fig. 2 institute Show, is the MDP process an of robot, it is built into weighting switching system, as shown in figure 3, in q₁Execute the dynamic of pickup Make, in q₁₀Place executes dropoff movement.

Step 2, complex task mathematical expression；

Complex task is subjected to mathematical expression according to linear time temporal logic theory；Linear time temporal logic (LTL) is a kind of Close to the high-level language of natural language, by sequential logic operator G (always), F (final), X (following), U (until) and boolean Operator(non-), ∧ (with), ∨ (or), → (implication),(being equivalent to), which combines, can accurately describe mobile robot Complex task.The task formula of Fig. 2 is

Step 3 generates B ü chi automatic machine；

In order to combine environmental information and mission bit stream, linear temporal task equation φ is turned by LTL2BA kit It is changed to the form of task feasibility chart, i.e. B ü chi automatic machine, converts B ü chi automatic machine, such as Fig. 4 for the formula of step 3 It is shown.B ü chi automatic machine is a five-tuple B:=(S_B,S_B0,Σ_B,δ_B,F_B).Wherein, S_BRepresent a limited state set； S_B0∈S_BRepresent original state；Σ_BRepresent the character list of input；δ_B∈S_B×Σ_B×S_BRepresent switching function；F_B∈S_B Represent set of final state.

Step 4 constructs task feasibility network topological diagram；

Switching system will be weighted and B ü chi automatic machine carries out cartesian product, obtained comprising environmental information and mission bit stream Task feasibility network environment topological diagram P, i.e.,P is a tuple (S_P,S_P0,δ_P,w_P,F_P), wherein S_P=Q × S_BRepresent finite state collection；Switching function is represented, is defined as and if only if q_j∈R(q_i) and s_l∈δ_B (s_k,L(q_i)) when, (q_j,s_l)∈δ_P((q_i,s_k)) set up；w_PFor the weight for being inherited from T, i.e., as (q_j,s_l)∈δ_P((q_j,s_l)) When, then w_P((q_i,s_k),(q_j,s_l))=w_T(q_i,q_j)；F_P=Q × F_BRepresent a final reception state.In task feasibility Useful point is selected on network topological diagram to construct MDP, the decision strategy that can guarantee so meets environmental information again Meet mission requirements.

Step 5, state point are deleted；

It on the task feasibility network topological diagram that step 4 obtains, takes the lead in rejecting some useless points, then introduces double marks Label, a label is state tag, i.e., before turning point, the state tag value of this state will be with Last status label one Cause, state P1 at is pickup, and turning point is in P10, then the state of P2-P9 is all pickup, similarly P10 it Afterwards, before P1, the state of P9-P2 is all dropoff.And it cannot be connected between different states, that is, a state can only Possess a state tag.Another label is metric, this model selects distance, i.e., each state and other state away from From the thought of behavior restraint criterion is from legal restraint in practice, when next state of robot makes metric lower than this One state, then NextState will be rejected, robot is in q1 point task, then robot will be toward movement at p10, robot is walked To q6, there are three states, and q5, q7, q9 may be selected in next step, and the metric of q5 is 3, that is, from target point only have 3 step pitches from, And q7 has 2 step pitches to have 2 step pitches from so will give up q5 state, and selecting q7, q9 state from target point from, q9 from target point.It is logical It crosses double labels and behavior restraint prepares, be further simplified task feasibility topology.As shown in figure 5, about by double labels and behavior Beam criterion removes down state point, and remaining point is to construct MDP.

Step 6 constructs Markovian decision model.

The present invention, as the synthesis automatic machine of LTL, avoids causing some LTL that DRA is not present using DRA using NBA The case where automatic machine, combines environmental information and task formula to obtain task feasibility network topological diagram, by double labels and Behavior restraint criterion removes useless point, remaining state point is formed building MDP, Utilization strategies iterative algorithm obtains optimal plan Slightly, the experimental results showed that such issues that method very good solution proposed by the invention.

Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention Range should not be construed as being limited to the specific forms stated in the embodiments, and protection scope of the present invention is also and in art technology Personnel conceive according to the present invention it is conceivable that equivalent technologies mean.

Claims

1. the optimal policy solution of mobile robot under dynamic environment, specific as follows:

Step 1: building improvement-weighting switching system；

It is an improvement-weighting switching system by the environment construction where robot, weighting switching system is the model to environment Change, is defined as tuple T:=(Q, a q₀,R,Π,L,w_T), wherein Q is a limited state set, choosing in environment Node as state set；q₀∈ Q represents original state, i.e. original state where robot, runs starting point；R→2^QGeneration Table handoff relation, shows between each state the connected relation (between path point)；Π represents atomic proposition, i.e., each shape The movement that state point should be completed；L:Q→2^ΠRepresent mark collection of functions；w_TSwitching weight is represented, as metric, i.e., separately One label.Effect of the atomic proposition in weighting switching system is to represent the attribute of each state, at state q When atomic proposition π is true, π ∈ L (q) is just set up, if q₂∈R(q₁), then q₂For q₁Succeeding state；It weights in switching system Any one track r_TIt is made of limited state in T, i.e. r_T=q₀q₁q₂..., wherein having q for arbitrary i >=0_i+1 ∈δ(q_i) set up, track r_TContain limited mark function o=o₁o₂o₃..., wherein o_i∈L(q_i)；

Step 2: complex task mathematical expression；

Complex task can be carried out to mathematical expression according to linear time temporal logic theory, linear time temporal logic (LTL) is a kind of Close to the high-level language of natural language, by sequential logic operator G (always), F (final), X (following), U (until) and boolean Operator(non-), ∧ (with), ∨ (or), → (implication),(being equivalent to), which combines, can accurately describe mobile robot Complex task；

Step 3: B ü chi automatic machine is generated；

In order to combine environmental information and mission bit stream, need to turn linear temporal task equation φ by LTL2BA kit It is changed to the form of task feasibility chart, i.e. B ü chi automatic machine.B ü chi automatic machine is a five-tuple B:=(S_B,S_B0,Σ_B, δ_B,F_B)；Wherein, S_BRepresent a limited state set；S_B0∈S_BRepresent original state；Σ_BRepresent the character list of input； δ_B∈S_B×Σ_B×S_BRepresent switching function；F_B∈S_BRepresent set of final state；

Step 4: building task feasibility network topological diagram；

Switching system will be weighted and B ü chi automatic machine carries out cartesian product, obtain times comprising environmental information and mission bit stream Be engaged in feasibility network topological diagram P, i.e.,P is a tuple (S_P,S_P0,δ_P,w_P,F_P), wherein S_P=Q × S_BRepresentative has Limit state set；Switching function is represented, is defined as and if only if q_j∈R(q_i) and s_l∈δ_B(s_k,L(q_i)) When, (q_j,s_l)∈δ_P((q_i,s_k)) set up；w_PFor the weight for being inherited from T, i.e., as (q_j,s_l)∈δ_P((q_j,s_l)) when, then w_P ((q_i,s_k),(q_j,s_l))=w_T(q_i,q_j)；F_P=Q × F_BRepresent a final reception state.It is opened up in task feasibility network It flutters on figure and selects useful point to construct MDP, the decision strategy that can guarantee in this way, which meets environmental information and meets again, appoints Business demand；

Step 5: state point is deleted；

On the task feasibility network topological diagram that step 4 obtains, some useless points are taken the lead in rejecting, i.e., can not the point of arrival, have Some only to input or only export, such point is not accessibility, because these is selected to put the interruption that will lead to strategy, It is unable to get optimal result；Double labels are introduced, a label is state tag, i.e., before turning point, the state mark of this state Label value will be consistent with Last status label, i.e., cannot be connected between different states, that is, a state can only possess one A state tag；Another label is metric, selects distance, i.e., the distance of each state and other state, behavior restraint standard Thought then is from legal restraint in practice, when next state of robot makes metric lower than this state, then under One state will be rejected；

Step 6: building Markovian decision model；

By remaining state point construct MDP, quintet M:=(T, S, A (i), p (and | i, a), r (i, a)) are known as a geneva Decision process (MDP) wherein the time point for choosing action is referred to as the decision moment, and remembers the point set at all decision moment with T；S For limited state space collection；Available action collection A (i) at state i is known as actionable space；P (| i a) is known as next decision When etching system state in which probability distribution；R (i, a) remuneration obtained for policymaker；MDP is completed in building, is changed using strategy It is solved for algorithm, obtains optimal policy.