CN117111522B - Mobile robot control method and system in dynamic environment - Google Patents

Mobile robot control method and system in dynamic environment Download PDF

Info

Publication number
CN117111522B
CN117111522B CN202311205191.0A CN202311205191A CN117111522B CN 117111522 B CN117111522 B CN 117111522B CN 202311205191 A CN202311205191 A CN 202311205191A CN 117111522 B CN117111522 B CN 117111522B
Authority
CN
China
Prior art keywords
state
risk
model
mobile robot
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311205191.0A
Other languages
Chinese (zh)
Other versions
CN117111522A (en
Inventor
宓建
邓社军
徐伟
廖华军
白乐濛
张俊
秦婧逸
于世军
嵇涛
徐悦
马瑞阳
沈梓怡
朱云翔
蔡爱鹏
崔嘉贺
张昱韬
闫奇志
张洋铭
张炳坤
艾尔帕尼·茹扎洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huizhi Technology Yangzhou Co ltd
Yangzhou University
Original Assignee
Huizhi Technology Yangzhou Co ltd
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huizhi Technology Yangzhou Co ltd, Yangzhou University filed Critical Huizhi Technology Yangzhou Co ltd
Priority to CN202311205191.0A priority Critical patent/CN117111522B/en
Publication of CN117111522A publication Critical patent/CN117111522A/en
Application granted granted Critical
Publication of CN117111522B publication Critical patent/CN117111522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • G05B19/0423Input/output
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/25Pc structure of the system
    • G05B2219/25257Microcontroller
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a mobile robot control method and a system in a dynamic environment, wherein the method comprises the following steps: constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state; predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation; and (3) coding and reinforcement learning by a fusion coder, constructing a fusion model in consideration of the prediction risk, and obtaining the multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning. The invention adopts logic specification to carry out multi-task control, considers risks of uncertain factors in a dynamic environment, proposes a coding reinforcement learning fusion algorithm, carries out safety path planning, avoids risks in the environment, generates an optimal safety path for a mobile robot to execute multi-task operation, avoids potential risks in the environment in advance, simplifies complex problems, compresses exploration space, reduces calculation cost, and improves solving speed.

Description

Mobile robot control method and system in dynamic environment
Technical Field
The invention relates to logistics robot operation, in particular to a mobile robot control method and system in a dynamic environment.
Background
With the advancement of artificial intelligence technology, the demand for robots to perform multiple tasks is increasing, however, the control technology of robots in performing multiple tasks is still to be improved, especially in the case of uncertain risks, the control of multiple tasks and the planning of safe paths. One typical application scenario is the distribution of material items between various universities. Currently, a plurality of colleges and universities generally have a plurality of school areas, distribute in each corner in the city, and the distance between the university district can cause the material transmission inconvenient between the school district, and the file, the material, the article transportation extremely rely on artifical transmission in a plurality of school areas, wastes time and energy, influences the office efficiency of religion staff and student's daily life. The robot control system suitable for performing multitasking is developed, automatic distribution of the cross-school zone robot is achieved, and great convenience can be provided for teachers and students in colleges and universities.
The main difficulty faced by the robot in performing the multi-task is mainly multi-task control and safe path planning. The existing research shows that the simple reinforcement learning algorithm can effectively solve the path planning problem of the robot under the single-task condition. However, the task control method cannot deal with the problem of multitasking, and for multitasking, the task needs to be divided into a plurality of stages, and a plurality of reward and punishment functions are defined, which is complex. Linear temporal logic logic specifications can effectively perform multitasking control, and can be fused with reinforcement learning, so that task control and path planning can be effectively solved. However, the existing fusion method of Linear temporal logic and reinforcement learning is mainly product type, the calculation cost is high, and the risk problem caused by uncertain factors in the environment is not considered.
Disclosure of Invention
The invention aims to provide a mobile robot control method and a mobile robot control system in a dynamic environment, so that potential risks in the environment are avoided in advance, complex problems are simplified, an exploration space is compressed, calculation cost is reduced, and solving speed is improved.
The technical scheme for realizing the purpose of the invention is as follows:
a mobile robot control method under dynamic environment includes the steps:
constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state;
predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;
and (3) coding and reinforcement learning by a fusion coder, constructing a fusion model by taking prediction risks in the environment into consideration, and obtaining the multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning.
Further, constructing an encoder based on the scLTL logic specification specifically includes:
multitasking atomic proposition: according to each single task state tau i Performing atom proposition on the task I, and writing a constructed scLTL logic specification task model phi;
converting task model phi into finite state automaton A φ
Based on finite state automaton A φ And constructing an encoder by the robot multitasking state.
Further, the scLTL logic specification task model phi is:
wherein T represents Boolean operator true, prp is atomic proposition set, phi 1 、φ 2 Representing an atomic proposition, X represents the next step, xphi represents that phi is true for the next state, F represents the future, fphi represents that phi is true for the future, phi 12 Represent phi 1 Is true up to phi 2 Satisfying the following conditions.
Further, the finite state automaton A φ The method comprises the following steps:
A φ =<Q,2 PrP ,δ,q 0 ,q F >
wherein Q represents a finite state set, 2 PrP Representing atomic propositionsFinite set, delta represents Q×2 PrP State transfer function of Q, Q 0 Represents an initial state, q F Representing a finite set of acceptance states.
Further, the encoder is:
m=Encoder(s T )=Encoder(τ 1 ,τ 2 ,…,τ i ,…,τ I )
wherein M is M, which is the encoded robot multitasking state, the dimension is one-dimensional, the multitasking state S T =(τ 1 ,τ 2 ,…,τ i ,…,τ I )。
Further, the predicting and evaluating the risk of the uncertain factors in the working environment of the mobile robot based on the historical data and the risk simulation specifically comprises the following steps:
constructing a historical risk model based on the historical risk data, iteratively learning the historical risk model until the requirement is met, and performing a period of time T in the future through the historical risk model 0 Predicting the risk in the model;
simulation is carried out aiming at possible sudden events in a real environment, a random simulation model is constructed, and each road is calculated in a future time period T based on a statistical method 0 Risk of (2).
Further, the historical risk model is:
wherein,c is a historical risk data set of the time period T and is calculated by a kernel equation; />For model output values, alpha, beta, epsilon are model learning parameters, < >>Alpha, beta and epsilon are determined through repeated iterative learning, and k is a path number;
the iterative learning process of the model is performed by the following minimization formula:
wherein C is 1 、C 2 A matrix is calculated for the kernel function, γ is a constant, K is the dimension of the training data, tr () is the trace of the calculated matrix, J is the data length, d is the dimension of the model constructed by learning,is a training dataset;
future period of time T 0 The risks in this are:
wherein Y is t As the risk model value at time t,transpose of the constructed risk model data set.
Further, the stochastic simulation model is:
wherein,e is the risk information of road segment k at time t f Is emergency information;
future time period T 0 Internal random risk
Future time period T 0 The internal risk is:
wherein alpha ', beta' are balance parameters,gaussian noise.
Further, the fusion modelWherein,is the encoded state set,/->x and y are robot environment coordinate values, < + >>Is a finite set of actions, < >>To change stateShift probability function (F)>For the initial state +.>As a reward function, prP is an atomic proposition set, L is a tag function, ++>R is a finite accepted state set isk Is risk information.
Further, the fusion model adopts Q-Learning to perform strategy Learning, and utilizes a Belman equation to update the Q value until convergence, wherein the Belman equation is as follows:
where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ ' is the discount factor, r is the reward, s ' is the updated state, and a ' is the updated action.
A mobile robot control system in a dynamic environment comprises a multi-task encoder module, a risk prediction evaluation module and a path planning module, wherein:
the multi-task encoder module constructs an encoder based on scLTL logic specifications, and encodes the multi-dimensional task state of the mobile robot into a one-dimensional state;
the risk prediction evaluation module predicts and evaluates risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;
the path planning module fuses the coding of the coder and reinforcement learning, and considers the prediction risk in the environment to construct a fusion model, and the multi-task operation of the mobile robot is obtained through the fusion model to carry out decision making and path planning.
Compared with the existing mobile robot control, the invention has the following beneficial effects:
(1) The method comprises the steps of performing multitasking management by using a scLTL logic formula, constructing a robot multitasking state encoder, converting a robot multitasking state into a one-dimensional single state, compressing a state space, and compressing the state space more obviously the more tasks are;
(2) Considering risks caused by uncertain factors in the environment, planning an optimal operation path for avoiding risks for the robot on a planning level, and improving the safety guarantee of the robot, wherein the optimal operation path is a problem which cannot be solved by the traditional MDP algorithm or the scLTL and MDP algorithm;
(3) Compared with the algorithms of the traditional MDP and the scLTL, the algorithm architecture is more effective and the calculation cost is lower.
Drawings
FIG. 1 is a diagram of a control system architecture of the present invention.
Fig. 2 is a block diagram of a multitasking control module.
FIG. 3 is a block diagram of risk prediction evaluation.
Fig. 4 is a block diagram of path planning.
FIG. 5 is a flow chart of a control method of the present invention.
Fig. 6 is a simplified map of Yangzi jin school district, lotus pool school district at Yangzhou university.
Fig. 7 is a path plan view under the condition of clear traffic in a school, fig. 7 (a) is a path plan view of Yangzi jin school district, and fig. 7 (b) is a path plan view of a lotus pool school district.
Fig. 8 is a diagram of the relationship between the consideration of the risk of traffic jam in the school, fig. 8 (a) is a schematic diagram of the working path, and fig. 8 (b) is a schematic diagram of the relationship between the consideration and the learning times.
FIG. 9 is a graph showing the experimental results of 24X 60 environments.
Fig. 10 is a diagram of learning process and state compression results.
Detailed Description
The invention provides a mobile robot control method considering uncertain factor risks in a dynamic environment, and provides a mobile control system for performing multi-task operation on a mobile robot considering environment risks in the uncertain environment. The system adopts a systematic co-safe linear temporal logic (scLTL) logic specification to carry out multi-task control, considers risks of uncertain factors in a dynamic environment, provides a fusion algorithm of coding reinforcement learning, carries out safety path planning, avoids risks in the environment at a planning level, generates an optimal safety path for a mobile robot to execute multi-task operation, and avoids potential risks in the environment in advance. The proposed algorithm firstly utilizes a synchronous co-safe linear temporal logic (scLTL) logic specification to carry out atomic proposition on multiple tasks, and adopts a formula to carry out multiple task management; secondly, constructing a task encoder, encoding a multi-dimensional multi-task state into a one-dimensional single state, providing a multi-task-oriented encoding reinforcement learning algorithm, converting a complex multi-task path planning problem into an optimal strategy problem meeting a scLTL formula, simplifying the complex problem, compressing an exploration space, reducing the calculation cost and improving the solving speed; the method is characterized in that risks caused by uncertain factors in a dynamic environment are considered in the algorithm, a security policy solving algorithm oriented to the risks of the dynamic environment is constructed, the problem that a traditional Markov Decision Process (MDP) cannot be applied to the dynamic environment is solved, risks are avoided in advance on a planning level, and an optimal security control policy and an optimal operation path are provided for mobile robot operation.
As shown in fig. 1, the mobile robot control system with risk consideration provided in this embodiment is composed of the following 3 parts:
s1, a multitasking encoder;
s2: risk prediction assessment;
s3: and (5) path planning.
The key technical principles of the invention are as follows:
s1 multitasking coding controller
As shown in the S1 part of fig. 1, the present project mainly performs multitasking control of a robot through scLTL logic specification, so as to construct an encoder, and encode a multidimensional task state into a one-dimensional state, thereby compressing a state space, and the technical principle is as follows:
firstly, the multitasking atoms are made into questions, the formula (1) is utilized to carry out arrangement expression,
wherein,
t represents the Boolean operator "true";
prp is an atomic proposition set;
φ,φ 1 ,φ 2 representing an atomic proposition;
x (next) represents the next step;
x phi represents that phi next state is true;
f (future) represents the future;
fΦ=tuΦ represents Φfuture is true;
U(until):φ 12 represent phi 1 Is true up to phi 2 Satisfying the following conditions.
Through the logic expression mode, atomic propositions of all tasks can be written into the formula (1), so that unified control and management are performed.
Secondly, constructing a finite state automaton A φ Converting scLTL equation (1) φ into finite state automaton FSAA φ
A φ =<Q,2 PrP ,δ,q 0 ,q F >……(2),
Wherein, Q: finite state set, 2 PrP : a finite set of atomic propositions, δ: q×2 PrP Q state transfer function, Q 0 : initial state, q F : a limited set of accept states. FSA A by finite state automaton g State transition of (c).
The robot executes I tasks, and the multitasking state is that I.e. the task state of the robot is defined by high-dimensional S T The dimension is defined as I. Each single anyThe number of traffic states is defined by (τ i ) And (5) determining. The multitasking state space is of size (τ) i ) I The whole state space grows exponentially along with the increase of the task quantity I, and the solving difficulty of the optimal control strategy of the robot is increased.
As shown in FIG. 2, the S1 module mainly consists of multitasking atomic proposition (S1-1), finite state automaton construction (S1-2), and constructed Encoder Encoder ().
Multitasking atomic proposition (S1-1): task I according to each single task state tau i Performing atom proposition, and writing a formula phi;
constructing a finite state automaton (S1-2): converting task formula phi into finite state automaton A φ
Build encoder (S1-3): based on finite state automaton A φ Knowledge (state transfer function delta) and robot multitasking state S T =(τ 1 ,τ 2 ,…,τ i ,…,τ I ) An encoder is constructed, defined by equation (3).
m=Encoder(s T )=Encoder(τ 1 ,τ 2 ,…,τ i ,…,τ I )……(3),
Wherein,the dimension of the encoded robot multitasking state is one-dimensional, so that the high-dimension robot multitasking state is converted into a one-dimensional state, and the state space is compressed. M < (tau) i ) I As the number of multitasking increases, the spatial compression effect becomes more pronounced.
The inputs and outputs of the S1 module are as follows:
input: robot multitasking state S T =(τ 1 ,τ 2 ,…,τ i ,…,τ I );
And (3) outputting: the robot multitasking encodes state m.
S2: risk prediction assessment
As shown in part S2 of fig. 1, the module mainly predicts and evaluates risks of uncertain factors in the environment, such as traffic jam risks (for example, but not limited to, traffic jam risks, risks caused by uncertain factors in the environment are all applicable).
As shown in FIG. 3, the S2 module mainly comprises a historical risk model (S2-1), a risk prediction (S2-2), a random simulation mechanism (S2-3), a risk simulation (S2-4) and a risk calculation (S2-5) module.
The historical risk model (S2-1) is constructed based on historical risk data.Is a historical risk dataset for time period T, wherein, < ->Traffic risk information for the road segment numbered k within the time period T. Risk model based on historical data->The construction method comprises the following steps:
(1) initializing model dataModel construction parameters alpha, beta, epsilon;
(2) the learning model is defined by formulas (4) - (7) as follows,
wherein c can be calculated from a kernel equation;alpha, beta, epsilon can be obtained through repeated iterative learning;
(3) the whole learning process is carried out by minimizing the formula (8), the formula (8) is as follows,
wherein C is 1 ,C 2 The resulting matrix is calculated for the kernel function, K is the dimension of the training data, tr () is the trace of the calculated matrix, J is the data length, d is the dimension of the model constructed by learning,is a data set after form integration, and gamma is a constant;
(4) iterative learning L times, thereby constructing a risk model based on historical data
Risk prediction (S2-2) predicts risk based on a historical risk model, and a future period of time T can be calculated by the formula (9) 0 The risk in the model is predicted, and the formula (9) is as follows,
wherein Y is t As the risk model value at time t,transpose of risk model data sets constructed for low dimensional space.
The method has the greatest disadvantage that the model is built by utilizing the historical data, so that the risk prediction is excessively based on the historical experience, but the actual situation has very high randomness, and the random problem cannot be well processed by the risk prediction based on the historical data model. In order to overcome the defect, the invention provides a new risk calculation method. The problem is well solved by a random policy simulation mechanism (S2-3) and a risk simulation mechanism (S2-4).
Random policy simulation mechanism (S2-3): the mechanism simulates an emergency event which may occur in a real environment by defining a stochastic simulation model, which is defined by a formula (10).
Wherein,e is the risk information of road segment k at time t f Is emergency information.
Risk simulation (S2-4): based on a random policy simulation mechanism (S2-3) and the current state, performing a large number of random risk simulations, and calculating a future time period T of each road by using a statistical method 0 Is used for the control of the risk of (a), future time period T 0 Internal random risk->
Risk calculation (S2-5): calculating the future time period T by the formula (11) 0 The internal risk, the formula is as follows,
wherein alpha ', beta' are balance parameters,gaussian noise. Note that (I) is->Risk predicted based on historical data, focusing on past historical data, lack of coping with sudden uncertainty, +.>Based on the random risk calculated by random simulation, the method focuses on randomness and uncertainty of the real condition to compensate +.>Is insufficient in terms of (a).
In summary, the input and output of the S2 module is as follows:
input: environmental information Xt;
and (3) outputting: risk information R isk
Taking traffic jam risk as an example, the traffic environment information of the current t moment is input, and the traffic jam risk probability of each road section and road network in a future period is output.
S3: path planning
As shown in part S3 of fig. 1, the module fuses scLTL and reinforcement learning, and considers risks in the environment to form a new fusion algorithm, so as to make decisions and path planning for the multi-task operation of the mobile robot.
As shown in FIG. 4, the coded scLTL and reinforcement learning fusion algorithm is fusion coded by finite state automata (S3-1) and MDP (Markov Decision Process) (S3-2). The inputs and outputs of the S3 module are as follows:
input: task information, risk information, finite state automaton FSA φ
And (3) outputting: multitasking policies, paths.
The main technical principle is as follows:
the proposed fusion algorithm model consists ofDefinition, wherein->Is the encoded state set,/->x and y are robot environment coordinate values, < + >>Is a finite set of actions, < >>For the state transition probability function +.> In the initial state of the device, the device is in a state of being in an initial state,for the reward function, prP is an atomic proposition set, L: s2 PrP For the tag function +.>R is a finite accepted state set isk Is risk information.
The conventional MDP or scLTL-to-MDP fusion algorithm is a product type algorithm, the whole exploration space is increased, and the environmental risk problem cannot be handled. Compared with the traditional algorithm, the algorithm provided by the invention constructs the robot multi-task state encoder, thereby converting the high-dimensional robot multi-task state into a one-dimensional singleOne state, realizing space compression and based on environmental risk R isk Solving an optimal strategy which can cope with the dynamic environment risk.
The project uses Q-Learning to perform strategy Learning, uses the Belman equation to update the Q value until convergence, the Belman equation is as follows,
where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ' is the discount factor, and r is the reward.
The algorithm pseudocode is as follows:
a flowchart of a system developed by the present invention is shown in fig. 5, comprising:
the first step: initializing various parameters;
and a second step of: multitasking;
and a third step of: utilizing scLTL to carry out multitasking atomic proposition, constructing a corresponding finite state automaton, and carrying out multitasking management;
fourth step: building an encoder
Fifth step: comprehensively considering environmental information and uncertain risks by using the proposed path planning algorithm, and carrying out path planning to generate a multi-task operation strategy and a path;
sixth step: the robot executes multitasking operation;
seventh step: if the robot encounters operation interruption in the executing process, returning to the second step, and re-planning the path;
eighth step: the job ends.
The invention aims to provide an optimal control strategy for the robot to execute the multi-task operation and plan a safe operation path by considering risks caused by uncertain factors in the environment. On one hand, the invention utilizes the scLTL logic specification to propogate all tasks to form a logic formula, constructs an encoder, effectively controls and uniformly manages the multi-tasks, and encodes the multi-dimensional task states into one-dimensional states, thereby compressing the state space and reducing the calculation cost. On the other hand, the invention considers uncertain risks in the environment, provides a scLTL-MDP algorithm based on a coding mode, calculates risk distribution caused by uncertain factors in the environment by using a historical data model and fusing a simulation method, generates an optimal safety strategy for the mobile robot to execute the multi-task in the uncertain environment, avoids the risks in advance, and provides an optimal safety path for the robot to realize the multi-task operation.
The invention can be applied to robot multitasking, and mainly comprises the fields of automation and logistics transportation, such as file material distribution, factory automation distribution and logistics distribution in each school district of a college.
Examples
The automatic distribution operation control of robots in various universities of Yangzhou university is used as an application case for explanation.
Fig. 6 shows a simplified map of Yangzi jin school district, the Yangzhou university, and the lotus pool school district. After the robot takes the parts in the Yangzi jin school district, the robot takes the school bus to go to the lotus pool school district for delivery service.
As shown in figure 6, the robot moves from s in Yangzi jin school district 0 Starting, going to pick-up point and western apartment C 1 Construction work C 2 Vaccinium uliginosum L C 3 And (5) taking the parts. After the picking-up is completed, the picking-up part is carried out to the east gate, the picking-up part is carried out to the lotus pool correction area, and the picking-up part is carried out in the lotus pool correction area s 1 Get off the vehicle and go to administrative building D 1 And (5) distributing the points, and distributing all the files in an administrative building.
All the task atom propositions are controlled by the formula (1) in the specific implementation mode. The task atom proposition is defined as follows:
(1) Atomic proposition c i Representing successful completion of the pick-up point C i For picking up items, e.g. c 1 Representing successful completion of the pick-up task of the apartment;
(2) Atomic proposition b, representing that the successful boarding school goes to the next school zone, for example, the successful boarding school goes to the lotus pool school zone in the east gate;
(3) Atomic proposition d i Representing success at D i Point completion of delivery tasks, e.g. d 1 Indicating success at D 1 The administrative building completes the delivery task, and note that in the illustrated example, all the collected file materials in the lotus pool are delivered to the administrative building in the lotus pool school district, i.e. d 1 ,d 2 ,d 3 The distribution points corresponding to the three propositions are actually 1 distribution point.
The proposition is written into the scLTL formula phi, as follows,
wherein Fc is i Indicating that the robot can successfully complete the picking point C in the future i Fb indicates that the robot will successfully ride the school bus in the future to go to the next school district,indicating that the robot cannot go to the next school zone before completing the picking task, fd i Indicating that the robot will succeed in the future in D i Point complete distribution task, < >>Indicating that the robot will not perform D i Dispensing tasks until the robot gets on the school bus to go to the next school zone,/the robot is moved to the next school zone>Indicating that the robot has not collected C i The document material of the point will not execute D i And (5) dispensing tasks.
The reward function of the proposed algorithm, defined in particular as the following table,
TABLE 1 bonus function definition table
Consideration of consideration Definition mode
r=-0.1 Basic step cost
r=+1.0 scLTL formula phi
r=-con Risk of traffic congestion
Wherein con is a constant, determined by traffic congestion risk distribution.
The source code of the project algorithm is written by python, and the job path generated by algorithm solving is shown in fig. 7 and 8. Wherein, fig. 7 is a path plan under the condition of smooth traffic in the school, and fig. 8 considers risk of traffic jam in the school.
Fig. 7 corresponds to fig. 6, in which the dark blue grid is a building such as a college or other non-passable area, the light blue is a movement work path of the robot, the green point is a departure position, and the purple point is a position where the robot performs a distribution task.
After simulation experiments of no traffic jam risk condition in the school are conducted, a simulation experiment that the Yangzi jin school area has traffic jam risk is designed, as shown in fig. 8 (a), a red grid represents a traffic jam risk road section, and simulation results show that the proposed algorithm can effectively solve the problem of path planning under the condition of traffic jam risk, can avoid the road section with higher jam risk, and provides a safe and efficient operation path for a robot. Fig. 8 (b) is a graph showing the change between the reward and the learning times in the learning process of the proposed algorithm, and it can be seen that the reward is not increased again after 200 times of learning, and reaches a convergence state, i.e. the optimal strategy has been learned.
State space compression conditions: as shown in fig. 9, taking a 24×60 environment as an example, the number of tasks is 6, the red road segments are light and deep, and the risk level is higher. Fig. 10 shows learning convergence and state space size, with a 76.6% reduction in state space.
The test result shows that the algorithm provided by the project can effectively control the multitasking of the robot, and an effective safe operation path is generated for the operation of the robot in the environment with risk.

Claims (9)

1. The mobile robot control method in the dynamic environment is characterized by comprising the following steps:
step 1: constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state;
step 2: predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;
step 3: coding and reinforcement learning by a fusion coder, constructing a fusion model by taking prediction risks in the environment into consideration, and obtaining a multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning;
the fusion modelWherein (1)>Is the encoded state set,/->m is Q, x, y is the robot environment coordinate value, < + >>Is a finite set of actions, < >>For the state transition probability function +.>For the initial state +.>As a reward function, prP is an atomic proposition set, L is a tag function, ++>R is a finite accepted state set isk Is risk information;
and the fusion model adopts Q-Learning to perform strategy Learning, and utilizes a Belman equation to update the Q value until convergence, wherein the Belman equation is as follows:
where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ ' is the discount factor, r is the reward, s ' is the updated state, and a ' is the updated action.
2. The method of claim 1, wherein constructing an encoder based on the scLTL logic specification comprises:
multitasking atomic proposition, based on each single task state τ i Performing atom proposition on the task I, and writing a constructed scLTL logic specification task model phi;
converting task model phi into finite state automaton A φ
Based on finite state automaton A φ And constructing an encoder by the robot multitasking state.
3. The method for controlling a mobile robot in a dynamic environment according to claim 2, wherein the scLTL logic specification task model Φ is:
wherein T represents Boolean operator true, prp is atomic proposition set, phi 1 、φ 2 Representing an atomic proposition, X represents the next step, xphi represents that phi is true for the next state, F represents the future, fphi represents that phi is true for the future, phi 12 Represent phi 1 Is true up to phi 2 Satisfying the following conditions.
4. The method for controlling a mobile robot in a dynamic environment according to claim 2, wherein the finite state automaton a φ The method comprises the following steps:
A φ =<Q,2 PrP ,δ,q 0 ,q F >
wherein Q represents a finite state set, 2 PrP Representing a finite set of atomic propositions, delta representing Q2 PrP State transfer function of Q, Q 0 Represents an initial state, q F Representing a finite set of acceptance states.
5. The method for controlling a mobile robot in a dynamic environment according to claim 1, wherein the encoder is:
m=Encoder(s T )=Encoder(τ 12 ,…,τ i ,…,τ I )
wherein M is M, which is the encoded robot multitasking state, the dimension is one-dimensional, the multitasking state S T =(τ 12 ,…,τ i ,…,τ I )。
6. The method for controlling a mobile robot in a dynamic environment according to claim 1, wherein predicting and evaluating risks of uncertain factors in the working environment of the mobile robot based on historical data and risk simulation specifically comprises:
building a historical wind based on historical risk dataThe risk model iteratively learns the historical risk model until meeting the requirement, and a period of time T is in the future through the historical risk model 0 Predicting the risk in the model;
simulating possible emergencies in a real environment, constructing a random simulation model, and calculating a time T of each road in the future based on a statistical method 0 Risk of (2).
7. The method for controlling a mobile robot in a dynamic environment according to claim 6, wherein the historical risk model is:
wherein,c is a historical risk data set of the time period T and is calculated by a kernel equation; />For model output values, alpha, beta, epsilon are model learning parameters, < >>Alpha, beta and epsilon are determined by repeated iterative learningDetermining k as a path number;
the iterative learning process of the model is performed by the following minimization formula:
wherein C is 1 、C 2 A matrix is calculated for the kernel function, γ is a constant, K is the dimension of the training data, tr () is the trace of the calculated matrix, J is the data length, d is the dimension of the model constructed by learning,is a training dataset;
future period of time T 0 The risks in this are:
wherein Y is t As the risk model value at time t,transpose of the constructed risk model data set.
8. The method for controlling a mobile robot in a dynamic environment according to claim 6, wherein the random simulation model is:
wherein,e is the risk information of road segment k at time t f Is emergency information;
future period of time T 0 Internal follow-upRisk of machine
Future period of time T 0 The risks in this are:
wherein alpha ', beta' are balance parameters,gaussian noise.
9. The mobile robot control system in the dynamic environment is characterized by comprising a multi-task encoder module, a risk prediction evaluation module and a path planning module, wherein:
the multi-task encoder module constructs an encoder based on scLTL logic specifications, and encodes the multi-dimensional task state of the mobile robot into a one-dimensional state;
the risk prediction evaluation module predicts and evaluates risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;
the path planning module fuses the coding of the coder and reinforcement learning, and takes the prediction risk in the environment into consideration to construct a fusion model, and the multi-task operation of the mobile robot is obtained through the fusion model to carry out decision making and path planning;
the fusion modelWherein (1)>Is the encoded state set,/->m is Q, x, y is the robot environment coordinate value, < + >>Is a finite set of actions, < >>For the state transition probability function +.>For the initial state +.>As a reward function, prP is an atomic proposition set, L is a tag function, ++>R is a finite accepted state set isk Is risk information;
and the fusion model adopts Q-Learning to perform strategy Learning, and utilizes a Belman equation to update the Q value until convergence, wherein the Belman equation is as follows:
where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ ' is the discount factor, r is the reward, s ' is the updated state, and a ' is the updated action.
CN202311205191.0A 2023-09-18 2023-09-18 Mobile robot control method and system in dynamic environment Active CN117111522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311205191.0A CN117111522B (en) 2023-09-18 2023-09-18 Mobile robot control method and system in dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311205191.0A CN117111522B (en) 2023-09-18 2023-09-18 Mobile robot control method and system in dynamic environment

Publications (2)

Publication Number Publication Date
CN117111522A CN117111522A (en) 2023-11-24
CN117111522B true CN117111522B (en) 2024-03-12

Family

ID=88810999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311205191.0A Active CN117111522B (en) 2023-09-18 2023-09-18 Mobile robot control method and system in dynamic environment

Country Status (1)

Country Link
CN (1) CN117111522B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130094533A (en) * 2012-02-16 2013-08-26 인하대학교 산학협력단 Collision prevention system of mobile robot in unknown environment and method thereof
CN103558856A (en) * 2013-11-21 2014-02-05 东南大学 Service mobile robot navigation method in dynamic environment
CN109196435A (en) * 2016-12-23 2019-01-11 X开发有限责任公司 Multi-agent coordination under sparse networks
CN113419524A (en) * 2021-06-10 2021-09-21 杭州电子科技大学 Robot path learning and obstacle avoidance system and method combining deep Q learning
CN114460933A (en) * 2021-12-30 2022-05-10 南京理工大学 Mobile robot local path planning algorithm for dynamic environment
WO2022161637A1 (en) * 2021-02-01 2022-08-04 Abb Schweiz Ag Visualization of a robot motion path and its use in robot path planning
CN115629607A (en) * 2022-10-25 2023-01-20 湖北汽车工业学院 Reinforced learning path planning method integrating historical information
CN115793657A (en) * 2022-12-09 2023-03-14 常州大学 Distribution robot path planning method based on temporal logic control strategy
CN116301027A (en) * 2023-02-08 2023-06-23 北京航空航天大学 Method for planning path of unmanned aerial vehicle in urban airspace based on safety reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130094533A (en) * 2012-02-16 2013-08-26 인하대학교 산학협력단 Collision prevention system of mobile robot in unknown environment and method thereof
CN103558856A (en) * 2013-11-21 2014-02-05 东南大学 Service mobile robot navigation method in dynamic environment
CN109196435A (en) * 2016-12-23 2019-01-11 X开发有限责任公司 Multi-agent coordination under sparse networks
WO2022161637A1 (en) * 2021-02-01 2022-08-04 Abb Schweiz Ag Visualization of a robot motion path and its use in robot path planning
CN113419524A (en) * 2021-06-10 2021-09-21 杭州电子科技大学 Robot path learning and obstacle avoidance system and method combining deep Q learning
CN114460933A (en) * 2021-12-30 2022-05-10 南京理工大学 Mobile robot local path planning algorithm for dynamic environment
CN115629607A (en) * 2022-10-25 2023-01-20 湖北汽车工业学院 Reinforced learning path planning method integrating historical information
CN115793657A (en) * 2022-12-09 2023-03-14 常州大学 Distribution robot path planning method based on temporal logic control strategy
CN116301027A (en) * 2023-02-08 2023-06-23 北京航空航天大学 Method for planning path of unmanned aerial vehicle in urban airspace based on safety reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
动态环境中基于碰撞预测的局部路径规划方法;高扬;孙树栋;赫东峰;;中国机械工程;20091110(第21期);全文 *
基于模糊逻辑的多移动机器人自适应协作围捕;王斐;闻时光;吴成东;魏巍;;智能***学报;20110215(第01期);全文 *
基于社区网络的编队与动态避障的机器人运动控制研究;孔一斐;中国优秀硕士学位论文全文数据库信息科技辑;20160531;全文 *
满足时序任务约束的机器人路径规划方法优化与研究;范振雍;中国优秀硕士学位论文全文数据库信息科技辑;20211031;全文 *

Also Published As

Publication number Publication date
CN117111522A (en) 2023-11-24

Similar Documents

Publication Publication Date Title
Shaw Fuzzy control of industrial systems: theory and applications
US11455576B2 (en) Architecture for explainable reinforcement learning
Abdullah et al. Generating university course timetable using genetic algorithms and local search
Lewis et al. Reinforcement learning and approximate dynamic programming for feedback control
US20210278825A1 (en) Real-Time Production Scheduling with Deep Reinforcement Learning and Monte Carlo Tree Research
Yuce et al. An ANN-GA semantic rule-based system to reduce the gap between predicted and actual energy consumption in buildings
CN111915073A (en) Short-term prediction method for intercity passenger flow of railway by considering date attribute and weather factor
Dubois et al. Decision-making under ordinal preferences and comparative uncertainty
US20210065006A1 (en) Construction sequencing optimization
CN106022549A (en) Short term load predication method based on neural network and thinking evolutionary search
Werbos Reinforcement learning and approximate dynamic programming (RLADP)—foundations, common misconceptions, and the challenges ahead
CN117111522B (en) Mobile robot control method and system in dynamic environment
Saridis Entropy in control engineering
Vasylkiv et al. Fuzzy model of the IT project environment impact on its completion
Valavanis et al. A general organizer model for robotic assemblies and intelligent robotic systems
Albelwi A Robust Energy Consumption Forecasting Model using ResNet-LSTM with Huber Loss
Shin et al. Production and inventory control of auto parts based on predicted probabilistic distribution of inventory
Gholamian et al. Meta knowledge of intelligent manufacturing: an overview of state-of-the-art
Al-Tabtabai et al. Construction project control using artificial neural networks
Zhang et al. Intelligent Building Construction Cost Prediction Based on BIM and Elman Neural Network
Xu et al. Load forecasting research based on high performance intelligent data processing of power big data
Aghapour et al. A novel approach for solving the fully fuzzy bi-level linear programming problems
Jiang et al. Deep reinforcement learning algorithm for solving material emergency dispatching problem
Anh et al. Modeling identification of the nonlinear robot arm system using miso narx fuzzy model and genetic algorithm
Roozemond et al. Usability of intelligent agent systems in urban traffic control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant