CN117111522B

CN117111522B - Mobile robot control method and system in dynamic environment

Info

Publication number: CN117111522B
Application number: CN202311205191.0A
Authority: CN
Inventors: 宓建; 邓社军; 徐伟; 廖华军; 白乐濛; 张俊; 秦婧逸; 于世军; 嵇涛; 徐悦; 马瑞阳; 沈梓怡; 朱云翔; 蔡爱鹏; 崔嘉贺; 张昱韬; 闫奇志; 张洋铭; 张炳坤; 艾尔帕尼·茹扎洪
Original assignee: Huizhi Technology Yangzhou Co ltd; Yangzhou University
Current assignee: Huizhi Technology Yangzhou Co ltd; Yangzhou University
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2024-03-12
Anticipated expiration: 2043-09-18
Also published as: CN117111522A

Abstract

The invention discloses a mobile robot control method and a system in a dynamic environment, wherein the method comprises the following steps: constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state; predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation; and (3) coding and reinforcement learning by a fusion coder, constructing a fusion model in consideration of the prediction risk, and obtaining the multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning. The invention adopts logic specification to carry out multi-task control, considers risks of uncertain factors in a dynamic environment, proposes a coding reinforcement learning fusion algorithm, carries out safety path planning, avoids risks in the environment, generates an optimal safety path for a mobile robot to execute multi-task operation, avoids potential risks in the environment in advance, simplifies complex problems, compresses exploration space, reduces calculation cost, and improves solving speed.

Description

Mobile robot control method and system in dynamic environment

Technical Field

The invention relates to logistics robot operation, in particular to a mobile robot control method and system in a dynamic environment.

Background

With the advancement of artificial intelligence technology, the demand for robots to perform multiple tasks is increasing, however, the control technology of robots in performing multiple tasks is still to be improved, especially in the case of uncertain risks, the control of multiple tasks and the planning of safe paths. One typical application scenario is the distribution of material items between various universities. Currently, a plurality of colleges and universities generally have a plurality of school areas, distribute in each corner in the city, and the distance between the university district can cause the material transmission inconvenient between the school district, and the file, the material, the article transportation extremely rely on artifical transmission in a plurality of school areas, wastes time and energy, influences the office efficiency of religion staff and student's daily life. The robot control system suitable for performing multitasking is developed, automatic distribution of the cross-school zone robot is achieved, and great convenience can be provided for teachers and students in colleges and universities.

The main difficulty faced by the robot in performing the multi-task is mainly multi-task control and safe path planning. The existing research shows that the simple reinforcement learning algorithm can effectively solve the path planning problem of the robot under the single-task condition. However, the task control method cannot deal with the problem of multitasking, and for multitasking, the task needs to be divided into a plurality of stages, and a plurality of reward and punishment functions are defined, which is complex. Linear temporal logic logic specifications can effectively perform multitasking control, and can be fused with reinforcement learning, so that task control and path planning can be effectively solved. However, the existing fusion method of Linear temporal logic and reinforcement learning is mainly product type, the calculation cost is high, and the risk problem caused by uncertain factors in the environment is not considered.

Disclosure of Invention

The invention aims to provide a mobile robot control method and a mobile robot control system in a dynamic environment, so that potential risks in the environment are avoided in advance, complex problems are simplified, an exploration space is compressed, calculation cost is reduced, and solving speed is improved.

The technical scheme for realizing the purpose of the invention is as follows:

a mobile robot control method under dynamic environment includes the steps:

constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state;

predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;

and (3) coding and reinforcement learning by a fusion coder, constructing a fusion model by taking prediction risks in the environment into consideration, and obtaining the multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning.

Further, constructing an encoder based on the scLTL logic specification specifically includes:

multitasking atomic proposition: according to each single task state tau _i Performing atom proposition on the task I, and writing a constructed scLTL logic specification task model phi;

converting task model phi into finite state automaton A _φ ；

Based on finite state automaton A _φ And constructing an encoder by the robot multitasking state.

Further, the scLTL logic specification task model phi is:

wherein T represents Boolean operator true, prp is atomic proposition set, phi ₁ 、φ ₂ Representing an atomic proposition, X represents the next step, xphi represents that phi is true for the next state, F represents the future, fphi represents that phi is true for the future, phi ₁ uφ ₂ Represent phi ₁ Is true up to phi ₂ Satisfying the following conditions.

Further, the finite state automaton A _φ The method comprises the following steps:

A _φ ＝<Q，2 ^PrP ，δ，q ₀ ，q _F >

wherein Q represents a finite state set, 2 ^PrP Representing atomic propositionsFinite set, delta represents Q×2 ^PrP State transfer function of Q, Q ₀ Represents an initial state, q _F Representing a finite set of acceptance states.

Further, the encoder is:

m＝Encoder(s _T )＝Encoder(τ ₁ ，τ ₂ ，…，τ _i ，…，τ _I )

wherein M is M, which is the encoded robot multitasking state, the dimension is one-dimensional, the multitasking state S _T ＝(τ ₁ ，τ ₂ ，…，τ _i ，…，τ _I )。

Further, the predicting and evaluating the risk of the uncertain factors in the working environment of the mobile robot based on the historical data and the risk simulation specifically comprises the following steps:

constructing a historical risk model based on the historical risk data, iteratively learning the historical risk model until the requirement is met, and performing a period of time T in the future through the historical risk model ₀ Predicting the risk in the model;

simulation is carried out aiming at possible sudden events in a real environment, a random simulation model is constructed, and each road is calculated in a future time period T based on a statistical method ₀ Risk of (2).

Further, the historical risk model is:

wherein,c is a historical risk data set of the time period T and is calculated by a kernel equation; />For model output values, alpha, beta, epsilon are model learning parameters, < >>Alpha, beta and epsilon are determined through repeated iterative learning, and k is a path number;

the iterative learning process of the model is performed by the following minimization formula:

wherein C is ₁ 、C ₂ A matrix is calculated for the kernel function, γ is a constant, K is the dimension of the training data, tr () is the trace of the calculated matrix, J is the data length, d is the dimension of the model constructed by learning,is a training dataset;

future period of time T ₀ The risks in this are:

wherein Y is _t As the risk model value at time t,transpose of the constructed risk model data set.

Further, the stochastic simulation model is:

wherein,e is the risk information of road segment k at time t _f Is emergency information;

future time period T ₀ Internal random risk

Future time period T ₀ The internal risk is:

wherein alpha ', beta' are balance parameters,gaussian noise.

Further, the fusion modelWherein,is the encoded state set,/->x and y are robot environment coordinate values, < + >>Is a finite set of actions, < >>To change stateShift probability function (F)>For the initial state +.>As a reward function, prP is an atomic proposition set, L is a tag function, ++>R is a finite accepted state set _isk Is risk information.

Further, the fusion model adopts Q-Learning to perform strategy Learning, and utilizes a Belman equation to update the Q value until convergence, wherein the Belman equation is as follows:

where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ ' is the discount factor, r is the reward, s ' is the updated state, and a ' is the updated action.

A mobile robot control system in a dynamic environment comprises a multi-task encoder module, a risk prediction evaluation module and a path planning module, wherein:

the multi-task encoder module constructs an encoder based on scLTL logic specifications, and encodes the multi-dimensional task state of the mobile robot into a one-dimensional state;

the risk prediction evaluation module predicts and evaluates risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;

the path planning module fuses the coding of the coder and reinforcement learning, and considers the prediction risk in the environment to construct a fusion model, and the multi-task operation of the mobile robot is obtained through the fusion model to carry out decision making and path planning.

Compared with the existing mobile robot control, the invention has the following beneficial effects:

(1) The method comprises the steps of performing multitasking management by using a scLTL logic formula, constructing a robot multitasking state encoder, converting a robot multitasking state into a one-dimensional single state, compressing a state space, and compressing the state space more obviously the more tasks are;

(2) Considering risks caused by uncertain factors in the environment, planning an optimal operation path for avoiding risks for the robot on a planning level, and improving the safety guarantee of the robot, wherein the optimal operation path is a problem which cannot be solved by the traditional MDP algorithm or the scLTL and MDP algorithm;

(3) Compared with the algorithms of the traditional MDP and the scLTL, the algorithm architecture is more effective and the calculation cost is lower.

Drawings

FIG. 1 is a diagram of a control system architecture of the present invention.

Fig. 2 is a block diagram of a multitasking control module.

FIG. 3 is a block diagram of risk prediction evaluation.

Fig. 4 is a block diagram of path planning.

FIG. 5 is a flow chart of a control method of the present invention.

Fig. 6 is a simplified map of Yangzi jin school district, lotus pool school district at Yangzhou university.

Fig. 7 is a path plan view under the condition of clear traffic in a school, fig. 7 (a) is a path plan view of Yangzi jin school district, and fig. 7 (b) is a path plan view of a lotus pool school district.

Fig. 8 is a diagram of the relationship between the consideration of the risk of traffic jam in the school, fig. 8 (a) is a schematic diagram of the working path, and fig. 8 (b) is a schematic diagram of the relationship between the consideration and the learning times.

FIG. 9 is a graph showing the experimental results of 24X 60 environments.

Fig. 10 is a diagram of learning process and state compression results.

Detailed Description

The invention provides a mobile robot control method considering uncertain factor risks in a dynamic environment, and provides a mobile control system for performing multi-task operation on a mobile robot considering environment risks in the uncertain environment. The system adopts a systematic co-safe linear temporal logic (scLTL) logic specification to carry out multi-task control, considers risks of uncertain factors in a dynamic environment, provides a fusion algorithm of coding reinforcement learning, carries out safety path planning, avoids risks in the environment at a planning level, generates an optimal safety path for a mobile robot to execute multi-task operation, and avoids potential risks in the environment in advance. The proposed algorithm firstly utilizes a synchronous co-safe linear temporal logic (scLTL) logic specification to carry out atomic proposition on multiple tasks, and adopts a formula to carry out multiple task management; secondly, constructing a task encoder, encoding a multi-dimensional multi-task state into a one-dimensional single state, providing a multi-task-oriented encoding reinforcement learning algorithm, converting a complex multi-task path planning problem into an optimal strategy problem meeting a scLTL formula, simplifying the complex problem, compressing an exploration space, reducing the calculation cost and improving the solving speed; the method is characterized in that risks caused by uncertain factors in a dynamic environment are considered in the algorithm, a security policy solving algorithm oriented to the risks of the dynamic environment is constructed, the problem that a traditional Markov Decision Process (MDP) cannot be applied to the dynamic environment is solved, risks are avoided in advance on a planning level, and an optimal security control policy and an optimal operation path are provided for mobile robot operation.

As shown in fig. 1, the mobile robot control system with risk consideration provided in this embodiment is composed of the following 3 parts:

s1, a multitasking encoder;

s2: risk prediction assessment;

s3: and (5) path planning.

The key technical principles of the invention are as follows:

s1 multitasking coding controller

As shown in the S1 part of fig. 1, the present project mainly performs multitasking control of a robot through scLTL logic specification, so as to construct an encoder, and encode a multidimensional task state into a one-dimensional state, thereby compressing a state space, and the technical principle is as follows:

firstly, the multitasking atoms are made into questions, the formula (1) is utilized to carry out arrangement expression,

wherein,

t represents the Boolean operator "true";

prp is an atomic proposition set;

φ，φ ₁ ，φ ₂ representing an atomic proposition;

x (next) represents the next step;

x phi represents that phi next state is true;

f (future) represents the future;

fΦ=tuΦ represents Φfuture is true;

U(until)：φ ₁ Uφ ₂ represent phi ₁ Is true up to phi ₂ Satisfying the following conditions.

Through the logic expression mode, atomic propositions of all tasks can be written into the formula (1), so that unified control and management are performed.

Secondly, constructing a finite state automaton A _φ Converting scLTL equation (1) φ into finite state automaton FSAA _φ ，

A _φ ＝<Q，2 ^PrP ，δ，q ₀ ，q _F >……(2)，

Wherein, Q: finite state set, 2 ^PrP : a finite set of atomic propositions, δ: q×2 ^PrP Q state transfer function, Q ₀ : initial state, q _F : a limited set of accept states. FSA A by finite state automaton _g State transition of (c).

The robot executes I tasks, and the multitasking state is that I.e. the task state of the robot is defined by high-dimensional S _T The dimension is defined as I. Each single anyThe number of traffic states is defined by (τ _i ) And (5) determining. The multitasking state space is of size (τ) _i ) ^I The whole state space grows exponentially along with the increase of the task quantity I, and the solving difficulty of the optimal control strategy of the robot is increased.

As shown in FIG. 2, the S1 module mainly consists of multitasking atomic proposition (S1-1), finite state automaton construction (S1-2), and constructed Encoder Encoder ().

Multitasking atomic proposition (S1-1): task I according to each single task state tau _i Performing atom proposition, and writing a formula phi;

constructing a finite state automaton (S1-2): converting task formula phi into finite state automaton A _φ ；

Build encoder (S1-3): based on finite state automaton A _φ Knowledge (state transfer function delta) and robot multitasking state S _T ＝(τ ₁ ，τ ₂ ，…，τ _i ，…，τ _I ) An encoder is constructed, defined by equation (3).

m＝Encoder(s _T )＝Encoder(τ ₁ ，τ ₂ ，…，τ _i ，…，τ _I )……(3)，

Wherein,the dimension of the encoded robot multitasking state is one-dimensional, so that the high-dimension robot multitasking state is converted into a one-dimensional state, and the state space is compressed. M < (tau) _i ) ^I As the number of multitasking increases, the spatial compression effect becomes more pronounced.

The inputs and outputs of the S1 module are as follows:

input: robot multitasking state S _T ＝(τ ₁ ，τ ₂ ，…，τ _i ，…，τ _I )；

And (3) outputting: the robot multitasking encodes state m.

S2: risk prediction assessment

As shown in part S2 of fig. 1, the module mainly predicts and evaluates risks of uncertain factors in the environment, such as traffic jam risks (for example, but not limited to, traffic jam risks, risks caused by uncertain factors in the environment are all applicable).

As shown in FIG. 3, the S2 module mainly comprises a historical risk model (S2-1), a risk prediction (S2-2), a random simulation mechanism (S2-3), a risk simulation (S2-4) and a risk calculation (S2-5) module.

The historical risk model (S2-1) is constructed based on historical risk data.Is a historical risk dataset for time period T, wherein, < ->Traffic risk information for the road segment numbered k within the time period T. Risk model based on historical data->The construction method comprises the following steps:

(1) initializing model dataModel construction parameters alpha, beta, epsilon;

(2) the learning model is defined by formulas (4) - (7) as follows,

wherein c can be calculated from a kernel equation;alpha, beta, epsilon can be obtained through repeated iterative learning;

(3) the whole learning process is carried out by minimizing the formula (8), the formula (8) is as follows,

wherein C is ₁ ，C ₂ The resulting matrix is calculated for the kernel function, K is the dimension of the training data, tr () is the trace of the calculated matrix, J is the data length, d is the dimension of the model constructed by learning,is a data set after form integration, and gamma is a constant;

(4) iterative learning L times, thereby constructing a risk model based on historical data

Risk prediction (S2-2) predicts risk based on a historical risk model, and a future period of time T can be calculated by the formula (9) ₀ The risk in the model is predicted, and the formula (9) is as follows,

wherein Y is _t As the risk model value at time t,transpose of risk model data sets constructed for low dimensional space.

The method has the greatest disadvantage that the model is built by utilizing the historical data, so that the risk prediction is excessively based on the historical experience, but the actual situation has very high randomness, and the random problem cannot be well processed by the risk prediction based on the historical data model. In order to overcome the defect, the invention provides a new risk calculation method. The problem is well solved by a random policy simulation mechanism (S2-3) and a risk simulation mechanism (S2-4).

Random policy simulation mechanism (S2-3): the mechanism simulates an emergency event which may occur in a real environment by defining a stochastic simulation model, which is defined by a formula (10).

Wherein,e is the risk information of road segment k at time t _f Is emergency information.

Risk simulation (S2-4): based on a random policy simulation mechanism (S2-3) and the current state, performing a large number of random risk simulations, and calculating a future time period T of each road by using a statistical method ₀ Is used for the control of the risk of (a), future time period T ₀ Internal random risk->

Risk calculation (S2-5): calculating the future time period T by the formula (11) ₀ The internal risk, the formula is as follows,

wherein alpha ', beta' are balance parameters,gaussian noise. Note that (I) is->Risk predicted based on historical data, focusing on past historical data, lack of coping with sudden uncertainty, +.>Based on the random risk calculated by random simulation, the method focuses on randomness and uncertainty of the real condition to compensate +.>Is insufficient in terms of (a).

In summary, the input and output of the S2 module is as follows:

input: environmental information Xt;

and (3) outputting: risk information R _isk 。

Taking traffic jam risk as an example, the traffic environment information of the current t moment is input, and the traffic jam risk probability of each road section and road network in a future period is output.

S3: path planning

As shown in part S3 of fig. 1, the module fuses scLTL and reinforcement learning, and considers risks in the environment to form a new fusion algorithm, so as to make decisions and path planning for the multi-task operation of the mobile robot.

As shown in FIG. 4, the coded scLTL and reinforcement learning fusion algorithm is fusion coded by finite state automata (S3-1) and MDP (Markov Decision Process) (S3-2). The inputs and outputs of the S3 module are as follows:

input: task information, risk information, finite state automaton FSA _φ ，

And (3) outputting: multitasking policies, paths.

The main technical principle is as follows:

the proposed fusion algorithm model consists ofDefinition, wherein->Is the encoded state set,/->x and y are robot environment coordinate values, < + >>Is a finite set of actions, < >>For the state transition probability function +.> In the initial state of the device, the device is in a state of being in an initial state,for the reward function, prP is an atomic proposition set, L: s2 ^PrP For the tag function +.>R is a finite accepted state set _isk Is risk information.

The conventional MDP or scLTL-to-MDP fusion algorithm is a product type algorithm, the whole exploration space is increased, and the environmental risk problem cannot be handled. Compared with the traditional algorithm, the algorithm provided by the invention constructs the robot multi-task state encoder, thereby converting the high-dimensional robot multi-task state into a one-dimensional singleOne state, realizing space compression and based on environmental risk R _isk Solving an optimal strategy which can cope with the dynamic environment risk.

The project uses Q-Learning to perform strategy Learning, uses the Belman equation to update the Q value until convergence, the Belman equation is as follows,

where Q (s, a) is the Q value of state s and action a, α "is the learning rate, γ' is the discount factor, and r is the reward.

The algorithm pseudocode is as follows:

a flowchart of a system developed by the present invention is shown in fig. 5, comprising:

the first step: initializing various parameters;

and a second step of: multitasking;

and a third step of: utilizing scLTL to carry out multitasking atomic proposition, constructing a corresponding finite state automaton, and carrying out multitasking management;

fourth step: building an encoder

Fifth step: comprehensively considering environmental information and uncertain risks by using the proposed path planning algorithm, and carrying out path planning to generate a multi-task operation strategy and a path;

sixth step: the robot executes multitasking operation;

seventh step: if the robot encounters operation interruption in the executing process, returning to the second step, and re-planning the path;

eighth step: the job ends.

The invention aims to provide an optimal control strategy for the robot to execute the multi-task operation and plan a safe operation path by considering risks caused by uncertain factors in the environment. On one hand, the invention utilizes the scLTL logic specification to propogate all tasks to form a logic formula, constructs an encoder, effectively controls and uniformly manages the multi-tasks, and encodes the multi-dimensional task states into one-dimensional states, thereby compressing the state space and reducing the calculation cost. On the other hand, the invention considers uncertain risks in the environment, provides a scLTL-MDP algorithm based on a coding mode, calculates risk distribution caused by uncertain factors in the environment by using a historical data model and fusing a simulation method, generates an optimal safety strategy for the mobile robot to execute the multi-task in the uncertain environment, avoids the risks in advance, and provides an optimal safety path for the robot to realize the multi-task operation.

The invention can be applied to robot multitasking, and mainly comprises the fields of automation and logistics transportation, such as file material distribution, factory automation distribution and logistics distribution in each school district of a college.

Examples

The automatic distribution operation control of robots in various universities of Yangzhou university is used as an application case for explanation.

Fig. 6 shows a simplified map of Yangzi jin school district, the Yangzhou university, and the lotus pool school district. After the robot takes the parts in the Yangzi jin school district, the robot takes the school bus to go to the lotus pool school district for delivery service.

As shown in figure 6, the robot moves from s in Yangzi jin school district ₀ Starting, going to pick-up point and western apartment C ₁ Construction work C ₂ Vaccinium uliginosum L C ₃ And (5) taking the parts. After the picking-up is completed, the picking-up part is carried out to the east gate, the picking-up part is carried out to the lotus pool correction area, and the picking-up part is carried out in the lotus pool correction area s ₁ Get off the vehicle and go to administrative building D ₁ And (5) distributing the points, and distributing all the files in an administrative building.

All the task atom propositions are controlled by the formula (1) in the specific implementation mode. The task atom proposition is defined as follows:

(1) Atomic proposition c _i Representing successful completion of the pick-up point C _i For picking up items, e.g. c ₁ Representing successful completion of the pick-up task of the apartment;

(2) Atomic proposition b, representing that the successful boarding school goes to the next school zone, for example, the successful boarding school goes to the lotus pool school zone in the east gate;

(3) Atomic proposition d _i Representing success at D _i Point completion of delivery tasks, e.g. d ₁ Indicating success at D ₁ The administrative building completes the delivery task, and note that in the illustrated example, all the collected file materials in the lotus pool are delivered to the administrative building in the lotus pool school district, i.e. d ₁ ,d ₂ ,d ₃ The distribution points corresponding to the three propositions are actually 1 distribution point.

The proposition is written into the scLTL formula phi, as follows,

wherein Fc is _i Indicating that the robot can successfully complete the picking point C in the future _i Fb indicates that the robot will successfully ride the school bus in the future to go to the next school district,indicating that the robot cannot go to the next school zone before completing the picking task, fd _i Indicating that the robot will succeed in the future in D _i Point complete distribution task, < >>Indicating that the robot will not perform D _i Dispensing tasks until the robot gets on the school bus to go to the next school zone,/the robot is moved to the next school zone>Indicating that the robot has not collected C _i The document material of the point will not execute D _i And (5) dispensing tasks.

The reward function of the proposed algorithm, defined in particular as the following table,

TABLE 1 bonus function definition table

Consideration of consideration	Definition mode
		r＝-0.1	Basic step cost
r＝+1.0	scLTL formula phi
		r＝-con	Risk of traffic congestion

Wherein con is a constant, determined by traffic congestion risk distribution.

The source code of the project algorithm is written by python, and the job path generated by algorithm solving is shown in fig. 7 and 8. Wherein, fig. 7 is a path plan under the condition of smooth traffic in the school, and fig. 8 considers risk of traffic jam in the school.

Fig. 7 corresponds to fig. 6, in which the dark blue grid is a building such as a college or other non-passable area, the light blue is a movement work path of the robot, the green point is a departure position, and the purple point is a position where the robot performs a distribution task.

After simulation experiments of no traffic jam risk condition in the school are conducted, a simulation experiment that the Yangzi jin school area has traffic jam risk is designed, as shown in fig. 8 (a), a red grid represents a traffic jam risk road section, and simulation results show that the proposed algorithm can effectively solve the problem of path planning under the condition of traffic jam risk, can avoid the road section with higher jam risk, and provides a safe and efficient operation path for a robot. Fig. 8 (b) is a graph showing the change between the reward and the learning times in the learning process of the proposed algorithm, and it can be seen that the reward is not increased again after 200 times of learning, and reaches a convergence state, i.e. the optimal strategy has been learned.

State space compression conditions: as shown in fig. 9, taking a 24×60 environment as an example, the number of tasks is 6, the red road segments are light and deep, and the risk level is higher. Fig. 10 shows learning convergence and state space size, with a 76.6% reduction in state space.

The test result shows that the algorithm provided by the project can effectively control the multitasking of the robot, and an effective safe operation path is generated for the operation of the robot in the environment with risk.

Claims

1. The mobile robot control method in the dynamic environment is characterized by comprising the following steps:

step 1: constructing an encoder based on scLTL logic specifications, and encoding the multi-dimensional task state of the mobile robot into a one-dimensional state;

step 2: predicting and evaluating risks of uncertain factors in the mobile robot working environment based on historical data and risk simulation;

step 3: coding and reinforcement learning by a fusion coder, constructing a fusion model by taking prediction risks in the environment into consideration, and obtaining a multi-task operation of the mobile robot through the fusion model to carry out decision making and path planning;

the fusion modelWherein (1)>Is the encoded state set,/->m is Q, x, y is the robot environment coordinate value, < + >>Is a finite set of actions, < >>For the state transition probability function +.>For the initial state +.>As a reward function, prP is an atomic proposition set, L is a tag function, ++>R is a finite accepted state set _isk Is risk information;

and the fusion model adopts Q-Learning to perform strategy Learning, and utilizes a Belman equation to update the Q value until convergence, wherein the Belman equation is as follows:

2. The method of claim 1, wherein constructing an encoder based on the scLTL logic specification comprises:

multitasking atomic proposition, based on each single task state τ _i Performing atom proposition on the task I, and writing a constructed scLTL logic specification task model phi;

converting task model phi into finite state automaton A _φ ；

3. The method for controlling a mobile robot in a dynamic environment according to claim 2, wherein the scLTL logic specification task model Φ is:

4. The method for controlling a mobile robot in a dynamic environment according to claim 2, wherein the finite state automaton a _φ The method comprises the following steps:

A _φ ＝<Q,2 ^PrP ,δ,q ₀ ,q _F >

wherein Q represents a finite state set, 2 ^PrP Representing a finite set of atomic propositions, delta representing Q2 ^PrP State transfer function of Q, Q ₀ Represents an initial state, q _F Representing a finite set of acceptance states.

5. The method for controlling a mobile robot in a dynamic environment according to claim 1, wherein the encoder is:

m＝Encoder(s _T )＝Encoder(τ ₁ ,τ ₂ ,…,τ _i ,…,τ _I )

wherein M is M, which is the encoded robot multitasking state, the dimension is one-dimensional, the multitasking state S _T ＝(τ ₁ ,τ ₂ ,…,τ _i ,…,τ _I )。

6. The method for controlling a mobile robot in a dynamic environment according to claim 1, wherein predicting and evaluating risks of uncertain factors in the working environment of the mobile robot based on historical data and risk simulation specifically comprises:

building a historical wind based on historical risk dataThe risk model iteratively learns the historical risk model until meeting the requirement, and a period of time T is in the future through the historical risk model ₀ Predicting the risk in the model;

simulating possible emergencies in a real environment, constructing a random simulation model, and calculating a time T of each road in the future based on a statistical method ₀ Risk of (2).

7. The method for controlling a mobile robot in a dynamic environment according to claim 6, wherein the historical risk model is:

wherein,c is a historical risk data set of the time period T and is calculated by a kernel equation; />For model output values, alpha, beta, epsilon are model learning parameters, < >>Alpha, beta and epsilon are determined by repeated iterative learningDetermining k as a path number;

future period of time T ₀ The risks in this are:

8. The method for controlling a mobile robot in a dynamic environment according to claim 6, wherein the random simulation model is:

future period of time T ₀ Internal follow-upRisk of machine

Future period of time T ₀ The risks in this are:

wherein alpha ', beta' are balance parameters,gaussian noise.

9. The mobile robot control system in the dynamic environment is characterized by comprising a multi-task encoder module, a risk prediction evaluation module and a path planning module, wherein:

the path planning module fuses the coding of the coder and reinforcement learning, and takes the prediction risk in the environment into consideration to construct a fusion model, and the multi-task operation of the mobile robot is obtained through the fusion model to carry out decision making and path planning;