CN109974737A

CN109974737A - Route planning method and system based on combination of safety evacuation signs and reinforcement learning

Info

Publication number: CN109974737A
Application number: CN201910289774.3A
Authority: CN
Inventors: 吕蕾; 周丽美; 赵修凯; 吕晨; 张桂娟; 刘弘
Original assignee: Shandong Normal University
Current assignee: Shandong Center Information Technology Ltd By Share Ltd
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-07-05
Anticipated expiration: 2039-04-11
Also published as: CN109974737B; LU101606B1

Abstract

The disclosure provides a route planning method and system based on combination of safety evacuation signs and reinforcement learning. The path planning method comprises the steps of establishing and rasterizing a two-dimensional simulation scene model, and initializing obstacles, intelligent agents and safety evacuation indication marks in the two-dimensional simulation scene model; the route planning is carried out by combining the safe evacuation indicator and the Q-Learning algorithm, and the specific process is as follows: initializing the Q value corresponding to each agent in the Q value table to be 0; acquiring the state information of each intelligent agent at the current moment, calculating corresponding rewards, and selecting actions with large corresponding Q values to move each intelligent agent; calculating the instant reward of each agent moving to a new position, updating the Q value table, judging whether the Q value table is converged, and if so, obtaining an optimal path sequence; otherwise, receiving and summarizing the input environment information and the corresponding state, the action made, the obtained reward and the output environment information sent by each intelligent agent, distributing the summarized information to each intelligent agent, and continuously moving each intelligent agent.

Description

The paths planning method and system combined based on safe escape mark and intensified learning

Technical field

The disclosure belongs to path planning field more particularly to a kind of road combined based on safe escape mark and intensified learning Diameter method and system for planning.

Background technique

Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.

In recent years, with the fast development of Urbanization in China, the building quantity of city public place, scale are not yet It is disconnected to expand, it means that we also continue to increase the safe pressure to be undertaken.How crowd is quickly really simulated in public affairs Evacuation path when contingency occurs for place altogether then becomes our major issues urgently to be resolved.By simulating crowd evacuation road The evacuation process of crowd when diameter can help security department's prediction contingency to occur, and then propose that effective motion planning solves Scheme shortens the evacuating personnel time, reduces the number of casualties.

Inventors have found that the motion planning of comparative maturity has A-star algorithm, artificial potential energy algorithm, cellular certainly at present Motivation, simulated annealing, genetic algorithm, nitrification enhancement etc. complicated environment cannot be rapidly adapted to and Learn and make to timely respond to, the problem for causing path planning low efficiency and accuracy difference occur.

Summary of the invention

To solve the above-mentioned problems, the first aspect of the disclosure provides a kind of based on safe escape mark and intensified learning In conjunction with paths planning method, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just The optimal path in complex environment can be rapidly found out.

To achieve the goals above, the disclosure adopts the following technical scheme that

A kind of paths planning method combined based on safe escape mark and intensified learning, comprising:

Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier, Intelligent body and safe escape Warning Mark；

Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm；

The detailed process of the step 2 are as follows:

Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table；

Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value big The mobile each intelligent body of movement；

Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge that Q value table is No convergence, if so, obtaining optimal path sequence；Otherwise enter in next step；

Step 2.4: receiving and summarize input environment information that each intelligent body is sent and its corresponding state, make Movement, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, turn To step 2.2.

To solve the above-mentioned problems, the second aspect of the disclosure provides a kind of based on safe escape mark and intensified learning In conjunction with path planning system, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just The optimal path in complex environment can be rapidly found out.

A kind of path planning system combined based on safe escape mark and intensified learning, comprising:

Two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initialization Barrier, intelligent body and safe escape Warning Mark in two-dimensional simulation model of place；

Path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path planning；

The path planning module, comprising:

Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0；

Intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates corresponding reward, The mobile each intelligent body of the movement for selecting corresponding q value big；

Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, updates Q It is worth table, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence；

Information sharing module is used to receive and summarize the input ring that each intelligent body is sent when Q value table is not restrained Border information and its corresponding state, the movement made, reward obtained and output environment information, then summary information is distributed to Each intelligent body is to realize that information sharing, continuation to update Q value table and judge updated Q according to the mobile each intelligent body of Q value Whether value table restrains.

To solve the above-mentioned problems, a kind of computer readable storage medium is provided in terms of the third of the disclosure, it will peace Full sign for safe evacuation and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, make intelligent body continuous Study perception ambient condition can be rapidly found out in complex environment along with the guiding function of safe escape Warning Mark Optimal path.

A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor Step in the paths planning method combined based on safe escape mark and intensified learning described above.

To solve the above-mentioned problems, the 4th aspect of the disclosure provides a kind of computer equipment, by safe escape mark Will and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, intelligent body are allowed constantly to learn to perceive Ambient condition can rapidly find out the optimal path in complex environment along with the guiding function of safe escape Warning Mark.

A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor are realized described above based on safe escape mark and intensified learning knot when executing described program Step in the paths planning method of conjunction.

The beneficial effect of the disclosure is:

(1) disclosure combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.

(2) due to lacking priori knowledge, the path that intensified learning searches out in initial iterative process is frequently not most Excellent, in response to this problem, by the way of multiple agent information sharing, expands environmental information and grasps region, improve search efficiency, Reduce the time arrived at the destination.

Detailed description of the invention

The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.

Fig. 1 is a kind of path planning side combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides Method flow chart.

Fig. 2 is the two dimension modeling effect picture that the embodiment of the present disclosure provides.

Fig. 3 is the safe escape Warning Mark position setting schematic diagram that the embodiment of the present disclosure provides.

Fig. 4 is that the combination safe escape Warning Mark that the embodiment of the present disclosure provides and Q-Learning algorithm carry out path rule Draw procedure chart.

Fig. 5 is the intelligent sport environmental interaction procedure chart that the embodiment of the present disclosure provides.

Fig. 6 is the intelligent body information sharing schematic diagram that the embodiment of the present disclosure provides.

Fig. 7 is a kind of path planning system combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides System structural schematic diagram.

Fig. 8 is the path planning module structural schematic diagram that the embodiment of the present disclosure provides.

Fig. 9 is the information sharing module principle figure that the embodiment of the present disclosure provides.

Specific embodiment

The disclosure is described further with embodiment with reference to the accompanying drawing.

It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field The identical meanings of understanding.

It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.

Embodiment 1

As shown in Figure 1, a kind of paths planning method combined based on safe escape mark and intensified learning of the present embodiment, Include:

Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier, Intelligent body and safe escape Warning Mark.

To improve authenticity, virtual environment is based on certain true shopping plaza contextual data and is established, by virtual environment It is defined as the region of M*N size, then carry out rasterizing processing to it and each grid is numbered.Each grid (x_i, y_i) indicate, x_iIndicate the line number where grid, y_iIndicate the columns where grid.Wherein, M and N is positive integer.

In the step 1, barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place are initialized The process of will, comprising:

Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the circle of pre-set radius is arranged Region is as collision detection region；

It places obstacles object number, position and shared area size；

Number, position, shared area size and the instruction content of safe escape Warning Mark, two dimension modeling effect are set Figure is as shown in Figure 2.

The setting rule of safe escape Warning Mark, comprising:

Rule is arranged in the order of safe escape Warning Mark, specific as follows:

↑: indicate straight trip；←: indicate left lateral；→: indicate right lateral；×: indicate that no through traffic；Expression can advance or after It moves back；Expression can left lateral or right lateral；Expression is turned left；Expression is turned right；, and by safe escape Warning Mark and order pair Database should be stored into.

Safe escape Warning Mark position setting rule, specific as follows:

It is anti-in the safe escape Warning Mark that preset quantity is placed by densely populated place area, market entrance and market corner Only stream of people's congestion；

Prevent personnel stranded in the safe escape Warning Mark that remote area places preset quantity；

There are the computer room important place of security risk and regions out-of-bounds to place traffic prohibited sign.

The placement in other regions then needs to meet safety sign setting general rule.

Such as:

The crowd is dense, safe escape Warning Mark is turned in entrance, the straight trip of setting more than corner or left and right, crowd is facilitated to exist It quickly makes a choice, avoids crowded herein；Safe escape Warning Mark is turned in setting straight trip or left and right more than the remote area, to prevent people Member can not flee the scene due to being unfamiliar with path and being stranded；It is arranged there are security risk or not to the specific position of people's opening No through traffic safe escape Warning Mark, in order to avoid the generation of contingency；Scene other everywhere according to real scene situation Rationally setting safe escape Warning Mark, it is desirable that safety sign setting general rule need to be met.Position setting as shown in figure 3, its In, in figure other than the underlying security exit signs direction referred to, also contains the Direction of superposition in basic direction, then do not go to live in the household of one's in-laws on getting married one by one It states.

Wherein, densely populated place area and remote area are the regions for simulating actual scene；Densely populated place area is super for flow of the people p Cross the region of preset flow pt1；Remote area can be preset as flow of the people P less than preset flow pt2 and apart from two-dimensional simulation scene Model boundary is no more than the region of pre-determined distance.Wherein, pt2 is less than pt1.

Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm.

Intensified learning mainly passes through intelligent body and continuously attempts in virtual environment, constantly malfunctions, and is returned with environmental feedback Reward value regularized learning algorithm strategy, the cumulative award value for obtaining learning process is maximum, reach the mesh for optimizing the movement of each step, Naturally final outgoing route is exactly optimal path.Wherein, it is positive when intelligent body executes the reward value that certain operating environment is fed back When, it is meant that the trend that this movement is performed will become larger, on the contrary, the execution trend of the movement will become smaller.

When original state, since intelligent bodies know nothing environmental information, need to carry out independent study, each intelligent body Initial actuating selection be all it is random, when combine safe escape Warning Mark complete intensified learning one wheel iteration when, intelligence Can body have certain experience accumulation, then carry out that resource information is shared, then using the resulting information of intelligent body as oneself warp It tests and is learnt, when being encountered in later iterative process with same state in gained information, then may be selected to execute to have most The movement of big reward value, then updates the Q value of itself.

As shown in figure 4, carrying out path rule in conjunction with safe escape Warning Mark and Q-Learning algorithm in the step 2 The detailed process drawn are as follows:

Wherein, nitrification enhancement is a kind of on-line study method for being different from supervised learning and unsupervised learning.Its benefit Reward is acted and received to interact with environment by state aware, selection with intelligent body, process is as shown in Figure 5.Often walk one Step, intelligent body all can select and execute a movement by environment of observation state, to change its state and be rewarded.Intelligence Body is known as an iteration from the exploration of origin-to-destination each time, means the learning ability one of intelligent body after many times iteration It is secondary to become strong, so finally obtaining as optimal policy.Q-Learning algorithm is as one of nitrification enhancement, and definition is such as Under:

Wherein, in formula []It is the Q value of reality, is denoted as Q_real(s_t,a_t+1)；

Q (s in formula []_t,a_t) it is the Q value estimated, it is denoted as Q_est(s_t,a_t+1)；γ is the pad value of the following reward, there is 0 <γ<1；α is learning efficiency, there is 0 < α < 1, and to determine current error, how many will be learnt for it；s_tFor the defeated of t moment Do well information, a_tFor the movement of t moment made, r_tReward, s are obtained by t moment_t+1Believe for the output state at t+1 moment Breath, a_t+1For the movement made at t+1 moment.

Above formula is are as follows:

Q_new(s_t,a_t)=Q_old(s_t,a_t)+α*(Q_real(s_t,a_t+1)-Q_est(s_t,a_t+1))

Wherein, Q_old(s_t,a_t) indicate old Q value, Q_new(s_t,a_t) indicate new Q value.

Safe escape Warning Mark and nitrification enhancement are applied on path planning by the present embodiment, in the process, The behavior aggregate A of intelligent body point is elemental motion A1, group acts A2 and optimal movement A3 three parts, be expressed as A=(A1, A2, A3).Wherein, elemental motion A1 is short movement belonging to eight of each intelligent body, is indicated are as follows: A1=(up, down, left, right,ul,dl,ur,dr)；

Wherein: up, down, left, right, ul, dl, ur, dr refer respectively to uplink, downlink, left lateral, right lateral, upper left Movement, bottom-left motion, upper right movement, bottom right movement.

Group movement A2 refers to that intelligent body follows group head to act；Optimal movement A3 refers to that intelligent body follows safe escape and refers to The long movement of the basic instruction of eight of indicating will, indicates are as follows:

A3=(forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r) shape State collection S then indicates each step that intelligent body is walked.

Wherein, forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r points It does not refer to keeping straight on, walks the left side, walk the right, stop, keeping straight on or returning, walk the left side or walking the right, turn left, turn right.

The learning process of motion planning is carried out in conjunction with safe escape Warning Mark and Q-learning algorithm, as follows:

1) initialize Q (s, a) be 0,

2) the status information s of intelligent body observation t moment_t；

3) according to current state and reward value r_t, the big movement a of intelligent body selection Q value_tIt is moved；

4) when intelligent body is selected acts on environment, environment state changes:

I.e. current location is transformed into next new position s_t+1, provide reward r immediately_t, r herein_tJust like giving a definition:

5) Q table is updated:Here, it gives The value of γ is 0.8, judge whether Q value table restrains, if so, stopping circulation, obtains optimal path sequence；Otherwise enter next Step；

6) receive and summarize input environment information that each intelligent body is sent and its corresponding state, the movement made, Reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing turn to the 2) step.

Since the present embodiment is the true crowd movement for simulating certain shopping square, crowd is numerous intelligent bodies.Intelligent body It cannot be individually present, because individual movement does not meet the population characteristic of people, and single intelligent body cannot have in the case where evacuating scene The completion task of efficiency, the limited scenario resources grasped can make the study course of intelligent body slow, extend optimal path The time is exported, it is the worst or even be unable to complete goal task.So intelligent body passes through before the progress of intensified learning iteration next time The environmental information that itself intensified learning obtains is output to general headquarters' message handler, then will be summarized by general headquarters message handlers again Information is issued to each intelligent body, completes the information sharing between multiple agent in this way, wherein shared information includes plan Summary, experience, ambient condition.Then each intelligent body is according to the respective resource of the information update obtained from general headquarters' message handler, Itself Q value, itself history strategy are considered simultaneously, determine the action policy in next iterative process, as shown in Figure 6.

The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.

The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect Rate reduces the time arrived at the destination.

Embodiment 2

As shown in fig. 7, present embodiments providing a kind of path planning combined based on safe escape mark and intensified learning System characterized by comprising

(1) two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initially Change barrier, intelligent body and the safe escape Warning Mark in two-dimensional simulation model of place.

It places obstacles object number, position and shared area size；

The setting rule of safe escape Warning Mark, comprising:

Rule is arranged in the order of safe escape Warning Mark, specific as follows:

Safe escape Warning Mark position setting rule, specific as follows:

Such as:

(2) path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path rule It draws.

As shown in figure 8, the path planning module, comprising:

(2.1) Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0；

(2.2) intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates correspondence Reward, the mobile each intelligent body of the movement for selecting corresponding q value big；

(2.3) Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, Q value table is updated, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence；

(2.4) information sharing module is used to receive when Q value table is not restrained and summarize what each intelligent body was sent Input environment information and its corresponding state, the movement made, reward obtained and output environment information, then by summary information Each intelligent body is distributed to realize information sharing, continue to update Q value table according to the mobile each intelligent body of Q value and judges to update Whether Q value table afterwards restrains.

Above formula is are as follows:

Q_new(s_t,a_t)=Q_old(s_t,a_t)+α*(Q_real(s_t,a_t+1)-Q_est(s_t,a_t+1))

1) initialize Q (s, a) be 0,

2) the status information s of intelligent body observation t moment_t；

During in specific implementation, information sharing module includes the primary processor and general headquarters' information controller two of intelligent body Part.The primary processor of intelligent body is used to input environment information, and (intelligent body and barrier and safe escape indicate under such as this state The content information of the distance of mark and angle, safe escape Warning Mark), output state s_t, the movement a that makes_t, encouraged Encourage r_tAnd environmental information, and manage itself gained information；General headquarters' information processing is used to each intelligent body sharing the information of coming Summarize, be then distributed to each intelligent body again, thus realize information sharing so as to the quick progress of next iteration, as shown in Figure 9.

Embodiment 3

A kind of computer readable storage medium is present embodiments provided, computer program is stored thereon with, which is located Manage the step realized in the paths planning method combined based on safe escape mark and intensified learning as shown in Figure 1 when device executes Suddenly.

Embodiment 4

Present embodiments provide a kind of computer equipment, including memory, processor and storage are on a memory and can be The computer program run on processor, the processor are realized as shown in Figure 1 based on safe escape when executing described program The step in paths planning method that mark and intensified learning combine.

It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).

The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..

The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.

Claims

1. a kind of paths planning method combined based on safe escape mark and intensified learning characterized by comprising

Step 1: establishing simultaneously rasterizing two-dimensional simulation model of place, barrier, intelligence in initialization two-dimensional simulation model of place Body and safe escape Warning Mark；

Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learn i ng algorithm；

The detailed process of the step 2 are as follows:

Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value is big to move Make mobile each intelligent body；

Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge whether Q value table is received It holds back, if so, obtaining optimal path sequence；Otherwise enter in next step；

Step 2.4: receive and summarize input environment information that each intelligent body is sent and its corresponding state, make it is dynamic Work, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, it turns to Step 2.2.

2. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 2.3, the instant reward for being moved to each intelligent body of new position is set as r_t；

3. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 1, the process of rasterizing two-dimensional simulation model of place are as follows:

Two-dimensional simulation model of place is defined as to the region of M*N size, rasterizing processing is then carried out to it and to each grid It is numbered, wherein M and N is positive integer.

4. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 1, initializes barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place The process of will, comprising:

Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the border circular areas of pre-set radius is arranged As collision detection region；

It places obstacles object number, position and shared area size；

Number, position, shared area size and the instruction content of safe escape Warning Mark are set.

5. a kind of path planning system combined based on safe escape mark and intensified learning characterized by comprising

Two-dimensional simulation model of place initialization module, is used to establish and rasterizing two-dimensional simulation model of place, initialization are two-dimentional Barrier, intelligent body and safe escape Warning Mark in simulating scenes model；

Path planning module is used to that safe escape Warning Mark and Q-Learn i ng algorithm to be combined to carry out path planning；

The path planning module, comprising:

Intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates corresponding reward, selection The mobile each intelligent body of the big movement of corresponding q value；

Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, updates Q value table, Judge whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence；

Information sharing module is used to receive and summarize when Q value table is not restrained the input environment letter that each intelligent body is sent Breath and its corresponding state, the movement made, reward obtained and output environment information, then summary information is distributed to each Intelligent body is to realize that information sharing, continuation to update Q value table and judge updated Q value table according to the mobile each intelligent body of Q value Whether restrain.

6. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in Q value table convergence judgment module, the instant reward for being moved to each intelligent body of new position is set as r_t；

7. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in the Q value table, the process of rasterizing two-dimensional simulation model of place are as follows:

8. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in the Q value table, initializes barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place The process of will, comprising:

It places obstacles object number, position and shared area size；

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor It is realized when row such as the path planning side of any of claims 1-4 combined based on safe escape mark and intensified learning Step in method.

10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes such as base of any of claims 1-4 when executing described program Step in the paths planning method that safe escape mark and intensified learning combine.