The paths planning method and system combined based on safe escape mark and intensified learning
Technical field
The disclosure belongs to path planning field more particularly to a kind of road combined based on safe escape mark and intensified learning
Diameter method and system for planning.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill
Art.
In recent years, with the fast development of Urbanization in China, the building quantity of city public place, scale are not yet
It is disconnected to expand, it means that we also continue to increase the safe pressure to be undertaken.How crowd is quickly really simulated in public affairs
Evacuation path when contingency occurs for place altogether then becomes our major issues urgently to be resolved.By simulating crowd evacuation road
The evacuation process of crowd when diameter can help security department's prediction contingency to occur, and then propose that effective motion planning solves
Scheme shortens the evacuating personnel time, reduces the number of casualties.
Inventors have found that the motion planning of comparative maturity has A-star algorithm, artificial potential energy algorithm, cellular certainly at present
Motivation, simulated annealing, genetic algorithm, nitrification enhancement etc. complicated environment cannot be rapidly adapted to and
Learn and make to timely respond to, the problem for causing path planning low efficiency and accuracy difference occur.
Summary of the invention
To solve the above-mentioned problems, the first aspect of the disclosure provides a kind of based on safe escape mark and intensified learning
In conjunction with paths planning method, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical
The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just
The optimal path in complex environment can be rapidly found out.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of paths planning method combined based on safe escape mark and intensified learning, comprising:
Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier,
Intelligent body and safe escape Warning Mark;
Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm;
The detailed process of the step 2 are as follows:
Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table;
Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value big
The mobile each intelligent body of movement;
Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge that Q value table is
No convergence, if so, obtaining optimal path sequence;Otherwise enter in next step;
Step 2.4: receiving and summarize input environment information that each intelligent body is sent and its corresponding state, make
Movement, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, turn
To step 2.2.
To solve the above-mentioned problems, the second aspect of the disclosure provides a kind of based on safe escape mark and intensified learning
In conjunction with path planning system, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical
The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just
The optimal path in complex environment can be rapidly found out.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of path planning system combined based on safe escape mark and intensified learning, comprising:
Two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initialization
Barrier, intelligent body and safe escape Warning Mark in two-dimensional simulation model of place;
Path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path planning;
The path planning module, comprising:
Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0;
Intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates corresponding reward,
The mobile each intelligent body of the movement for selecting corresponding q value big;
Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, updates Q
It is worth table, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence;
Information sharing module is used to receive and summarize the input ring that each intelligent body is sent when Q value table is not restrained
Border information and its corresponding state, the movement made, reward obtained and output environment information, then summary information is distributed to
Each intelligent body is to realize that information sharing, continuation to update Q value table and judge updated Q according to the mobile each intelligent body of Q value
Whether value table restrains.
To solve the above-mentioned problems, a kind of computer readable storage medium is provided in terms of the third of the disclosure, it will peace
Full sign for safe evacuation and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, make intelligent body continuous
Study perception ambient condition can be rapidly found out in complex environment along with the guiding function of safe escape Warning Mark
Optimal path.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor
Step in the paths planning method combined based on safe escape mark and intensified learning described above.
To solve the above-mentioned problems, the 4th aspect of the disclosure provides a kind of computer equipment, by safe escape mark
Will and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, intelligent body are allowed constantly to learn to perceive
Ambient condition can rapidly find out the optimal path in complex environment along with the guiding function of safe escape Warning Mark.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage
Computer program, the processor are realized described above based on safe escape mark and intensified learning knot when executing described program
Step in the paths planning method of conjunction.
The beneficial effect of the disclosure is:
(1) disclosure combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning
Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark
Rapidly find out the optimal path in complex environment.
(2) due to lacking priori knowledge, the path that intensified learning searches out in initial iterative process is frequently not most
Excellent, in response to this problem, by the way of multiple agent information sharing, expands environmental information and grasps region, improve search efficiency,
Reduce the time arrived at the destination.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown
Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of path planning side combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides
Method flow chart.
Fig. 2 is the two dimension modeling effect picture that the embodiment of the present disclosure provides.
Fig. 3 is the safe escape Warning Mark position setting schematic diagram that the embodiment of the present disclosure provides.
Fig. 4 is that the combination safe escape Warning Mark that the embodiment of the present disclosure provides and Q-Learning algorithm carry out path rule
Draw procedure chart.
Fig. 5 is the intelligent sport environmental interaction procedure chart that the embodiment of the present disclosure provides.
Fig. 6 is the intelligent body information sharing schematic diagram that the embodiment of the present disclosure provides.
Fig. 7 is a kind of path planning system combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides
System structural schematic diagram.
Fig. 8 is the path planning module structural schematic diagram that the embodiment of the present disclosure provides.
Fig. 9 is the information sharing module principle figure that the embodiment of the present disclosure provides.
Specific embodiment
The disclosure is described further with embodiment with reference to the accompanying drawing.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another
It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
Embodiment 1
As shown in Figure 1, a kind of paths planning method combined based on safe escape mark and intensified learning of the present embodiment,
Include:
Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier,
Intelligent body and safe escape Warning Mark.
To improve authenticity, virtual environment is based on certain true shopping plaza contextual data and is established, by virtual environment
It is defined as the region of M*N size, then carry out rasterizing processing to it and each grid is numbered.Each grid (xi,
yi) indicate, xiIndicate the line number where grid, yiIndicate the columns where grid.Wherein, M and N is positive integer.
In the step 1, barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place are initialized
The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the circle of pre-set radius is arranged
Region is as collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark, two dimension modeling effect are set
Figure is as shown in Figure 2.
The setting rule of safe escape Warning Mark, comprising:
Rule is arranged in the order of safe escape Warning Mark, specific as follows:
↑: indicate straight trip;←: indicate left lateral;→: indicate right lateral;×: indicate that no through traffic;Expression can advance or after
It moves back;Expression can left lateral or right lateral;Expression is turned left;Expression is turned right;, and by safe escape Warning Mark and order pair
Database should be stored into.
Safe escape Warning Mark position setting rule, specific as follows:
It is anti-in the safe escape Warning Mark that preset quantity is placed by densely populated place area, market entrance and market corner
Only stream of people's congestion;
Prevent personnel stranded in the safe escape Warning Mark that remote area places preset quantity;
There are the computer room important place of security risk and regions out-of-bounds to place traffic prohibited sign.
The placement in other regions then needs to meet safety sign setting general rule.
Such as:
The crowd is dense, safe escape Warning Mark is turned in entrance, the straight trip of setting more than corner or left and right, crowd is facilitated to exist
It quickly makes a choice, avoids crowded herein;Safe escape Warning Mark is turned in setting straight trip or left and right more than the remote area, to prevent people
Member can not flee the scene due to being unfamiliar with path and being stranded;It is arranged there are security risk or not to the specific position of people's opening
No through traffic safe escape Warning Mark, in order to avoid the generation of contingency;Scene other everywhere according to real scene situation
Rationally setting safe escape Warning Mark, it is desirable that safety sign setting general rule need to be met.Position setting as shown in figure 3, its
In, in figure other than the underlying security exit signs direction referred to, also contains the Direction of superposition in basic direction, then do not go to live in the household of one's in-laws on getting married one by one
It states.
Wherein, densely populated place area and remote area are the regions for simulating actual scene;Densely populated place area is super for flow of the people p
Cross the region of preset flow pt1;Remote area can be preset as flow of the people P less than preset flow pt2 and apart from two-dimensional simulation scene
Model boundary is no more than the region of pre-determined distance.Wherein, pt2 is less than pt1.
Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm.
Intensified learning mainly passes through intelligent body and continuously attempts in virtual environment, constantly malfunctions, and is returned with environmental feedback
Reward value regularized learning algorithm strategy, the cumulative award value for obtaining learning process is maximum, reach the mesh for optimizing the movement of each step,
Naturally final outgoing route is exactly optimal path.Wherein, it is positive when intelligent body executes the reward value that certain operating environment is fed back
When, it is meant that the trend that this movement is performed will become larger, on the contrary, the execution trend of the movement will become smaller.
When original state, since intelligent bodies know nothing environmental information, need to carry out independent study, each intelligent body
Initial actuating selection be all it is random, when combine safe escape Warning Mark complete intensified learning one wheel iteration when, intelligence
Can body have certain experience accumulation, then carry out that resource information is shared, then using the resulting information of intelligent body as oneself warp
It tests and is learnt, when being encountered in later iterative process with same state in gained information, then may be selected to execute to have most
The movement of big reward value, then updates the Q value of itself.
As shown in figure 4, carrying out path rule in conjunction with safe escape Warning Mark and Q-Learning algorithm in the step 2
The detailed process drawn are as follows:
Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table;
Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value big
The mobile each intelligent body of movement;
Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge that Q value table is
No convergence, if so, obtaining optimal path sequence;Otherwise enter in next step;
Step 2.4: receiving and summarize input environment information that each intelligent body is sent and its corresponding state, make
Movement, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, turn
To step 2.2.
Wherein, nitrification enhancement is a kind of on-line study method for being different from supervised learning and unsupervised learning.Its benefit
Reward is acted and received to interact with environment by state aware, selection with intelligent body, process is as shown in Figure 5.Often walk one
Step, intelligent body all can select and execute a movement by environment of observation state, to change its state and be rewarded.Intelligence
Body is known as an iteration from the exploration of origin-to-destination each time, means the learning ability one of intelligent body after many times iteration
It is secondary to become strong, so finally obtaining as optimal policy.Q-Learning algorithm is as one of nitrification enhancement, and definition is such as
Under:
Wherein, in formula []It is the Q value of reality, is denoted as Qreal(st,at+1);
Q (s in formula []t,at) it is the Q value estimated, it is denoted as Qest(st,at+1);γ is the pad value of the following reward, there is 0
<γ<1;α is learning efficiency, there is 0 < α < 1, and to determine current error, how many will be learnt for it;stFor the defeated of t moment
Do well information, atFor the movement of t moment made, rtReward, s are obtained by t momentt+1Believe for the output state at t+1 moment
Breath, at+1For the movement made at t+1 moment.
Above formula is are as follows:
Qnew(st,at)=Qold(st,at)+α*(Qreal(st,at+1)-Qest(st,at+1))
Wherein, Qold(st,at) indicate old Q value, Qnew(st,at) indicate new Q value.
Safe escape Warning Mark and nitrification enhancement are applied on path planning by the present embodiment, in the process,
The behavior aggregate A of intelligent body point is elemental motion A1, group acts A2 and optimal movement A3 three parts, be expressed as A=(A1, A2,
A3).Wherein, elemental motion A1 is short movement belonging to eight of each intelligent body, is indicated are as follows: A1=(up, down, left,
right,ul,dl,ur,dr);
Wherein: up, down, left, right, ul, dl, ur, dr refer respectively to uplink, downlink, left lateral, right lateral, upper left
Movement, bottom-left motion, upper right movement, bottom right movement.
Group movement A2 refers to that intelligent body follows group head to act;Optimal movement A3 refers to that intelligent body follows safe escape and refers to
The long movement of the basic instruction of eight of indicating will, indicates are as follows:
A3=(forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r) shape
State collection S then indicates each step that intelligent body is walked.
Wherein, forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r points
It does not refer to keeping straight on, walks the left side, walk the right, stop, keeping straight on or returning, walk the left side or walking the right, turn left, turn right.
The learning process of motion planning is carried out in conjunction with safe escape Warning Mark and Q-learning algorithm, as follows:
1) initialize Q (s, a) be 0,
2) the status information s of intelligent body observation t momentt;
3) according to current state and reward value rt, the big movement a of intelligent body selection Q valuetIt is moved;
4) when intelligent body is selected acts on environment, environment state changes:
I.e. current location is transformed into next new position st+1, provide reward r immediatelyt, r hereintJust like giving a definition:
5) Q table is updated:Here, it gives
The value of γ is 0.8, judge whether Q value table restrains, if so, stopping circulation, obtains optimal path sequence;Otherwise enter next
Step;
6) receive and summarize input environment information that each intelligent body is sent and its corresponding state, the movement made,
Reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing turn to the
2) step.
Since the present embodiment is the true crowd movement for simulating certain shopping square, crowd is numerous intelligent bodies.Intelligent body
It cannot be individually present, because individual movement does not meet the population characteristic of people, and single intelligent body cannot have in the case where evacuating scene
The completion task of efficiency, the limited scenario resources grasped can make the study course of intelligent body slow, extend optimal path
The time is exported, it is the worst or even be unable to complete goal task.So intelligent body passes through before the progress of intensified learning iteration next time
The environmental information that itself intensified learning obtains is output to general headquarters' message handler, then will be summarized by general headquarters message handlers again
Information is issued to each intelligent body, completes the information sharing between multiple agent in this way, wherein shared information includes plan
Summary, experience, ambient condition.Then each intelligent body is according to the respective resource of the information update obtained from general headquarters' message handler,
Itself Q value, itself history strategy are considered simultaneously, determine the action policy in next iterative process, as shown in Figure 6.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning
Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark
Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect
Rate reduces the time arrived at the destination.
Embodiment 2
As shown in fig. 7, present embodiments providing a kind of path planning combined based on safe escape mark and intensified learning
System characterized by comprising
(1) two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initially
Change barrier, intelligent body and the safe escape Warning Mark in two-dimensional simulation model of place.
To improve authenticity, virtual environment is based on certain true shopping plaza contextual data and is established, by virtual environment
It is defined as the region of M*N size, then carry out rasterizing processing to it and each grid is numbered.Each grid (xi,
yi) indicate, xiIndicate the line number where grid, yiIndicate the columns where grid.Wherein, M and N is positive integer.
In the step 1, barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place are initialized
The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the circle of pre-set radius is arranged
Region is as collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark, two dimension modeling effect are set
Figure is as shown in Figure 2.
The setting rule of safe escape Warning Mark, comprising:
Rule is arranged in the order of safe escape Warning Mark, specific as follows:
↑: indicate straight trip;←: indicate left lateral;→: indicate right lateral;×: indicate that no through traffic;Expression can advance or after
It moves back;Expression can left lateral or right lateral;Expression is turned left;Expression is turned right;, and by safe escape Warning Mark and order pair
Database should be stored into.
Safe escape Warning Mark position setting rule, specific as follows:
It is anti-in the safe escape Warning Mark that preset quantity is placed by densely populated place area, market entrance and market corner
Only stream of people's congestion;
Prevent personnel stranded in the safe escape Warning Mark that remote area places preset quantity;
There are the computer room important place of security risk and regions out-of-bounds to place traffic prohibited sign.
The placement in other regions then needs to meet safety sign setting general rule.
Such as:
The crowd is dense, safe escape Warning Mark is turned in entrance, the straight trip of setting more than corner or left and right, crowd is facilitated to exist
It quickly makes a choice, avoids crowded herein;Safe escape Warning Mark is turned in setting straight trip or left and right more than the remote area, to prevent people
Member can not flee the scene due to being unfamiliar with path and being stranded;It is arranged there are security risk or not to the specific position of people's opening
No through traffic safe escape Warning Mark, in order to avoid the generation of contingency;Scene other everywhere according to real scene situation
Rationally setting safe escape Warning Mark, it is desirable that safety sign setting general rule need to be met.Position setting as shown in figure 3, its
In, in figure other than the underlying security exit signs direction referred to, also contains the Direction of superposition in basic direction, then do not go to live in the household of one's in-laws on getting married one by one
It states.
Wherein, densely populated place area and remote area are the regions for simulating actual scene;Densely populated place area is super for flow of the people p
Cross the region of preset flow pt1;Remote area can be preset as flow of the people P less than preset flow pt2 and apart from two-dimensional simulation scene
Model boundary is no more than the region of pre-determined distance.Wherein, pt2 is less than pt1.
(2) path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path rule
It draws.
Intensified learning mainly passes through intelligent body and continuously attempts in virtual environment, constantly malfunctions, and is returned with environmental feedback
Reward value regularized learning algorithm strategy, the cumulative award value for obtaining learning process is maximum, reach the mesh for optimizing the movement of each step,
Naturally final outgoing route is exactly optimal path.Wherein, it is positive when intelligent body executes the reward value that certain operating environment is fed back
When, it is meant that the trend that this movement is performed will become larger, on the contrary, the execution trend of the movement will become smaller.
When original state, since intelligent bodies know nothing environmental information, need to carry out independent study, each intelligent body
Initial actuating selection be all it is random, when combine safe escape Warning Mark complete intensified learning one wheel iteration when, intelligence
Can body have certain experience accumulation, then carry out that resource information is shared, then using the resulting information of intelligent body as oneself warp
It tests and is learnt, when being encountered in later iterative process with same state in gained information, then may be selected to execute to have most
The movement of big reward value, then updates the Q value of itself.
As shown in figure 8, the path planning module, comprising:
(2.1) Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0;
(2.2) intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates correspondence
Reward, the mobile each intelligent body of the movement for selecting corresponding q value big;
(2.3) Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position,
Q value table is updated, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence;
(2.4) information sharing module is used to receive when Q value table is not restrained and summarize what each intelligent body was sent
Input environment information and its corresponding state, the movement made, reward obtained and output environment information, then by summary information
Each intelligent body is distributed to realize information sharing, continue to update Q value table according to the mobile each intelligent body of Q value and judges to update
Whether Q value table afterwards restrains.
Wherein, nitrification enhancement is a kind of on-line study method for being different from supervised learning and unsupervised learning.Its benefit
Reward is acted and received to interact with environment by state aware, selection with intelligent body, process is as shown in Figure 5.Often walk one
Step, intelligent body all can select and execute a movement by environment of observation state, to change its state and be rewarded.Intelligence
Body is known as an iteration from the exploration of origin-to-destination each time, means the learning ability one of intelligent body after many times iteration
It is secondary to become strong, so finally obtaining as optimal policy.Q-Learning algorithm is as one of nitrification enhancement, and definition is such as
Under:
Wherein, in formula []It is the Q value of reality, is denoted as Qreal(st,at+1);
Q (s in formula []t,at) it is the Q value estimated, it is denoted as Qest(st,at+1);γ is the pad value of the following reward, there is 0
<γ<1;α is learning efficiency, there is 0 < α < 1, and to determine current error, how many will be learnt for it;stFor the defeated of t moment
Do well information, atFor the movement of t moment made, rtReward, s are obtained by t momentt+1Believe for the output state at t+1 moment
Breath, at+1For the movement made at t+1 moment.
Above formula is are as follows:
Qnew(st,at)=Qold(st,at)+α*(Qreal(st,at+1)-Qest(st,at+1))
Wherein, Qold(st,at) indicate old Q value, Qnew(st,at) indicate new Q value.
Safe escape Warning Mark and nitrification enhancement are applied on path planning by the present embodiment, in the process,
The behavior aggregate A of intelligent body point is elemental motion A1, group acts A2 and optimal movement A3 three parts, be expressed as A=(A1, A2,
A3).Wherein, elemental motion A1 is short movement belonging to eight of each intelligent body, is indicated are as follows: A1=(up, down, left,
right,ul,dl,ur,dr);
Wherein: up, down, left, right, ul, dl, ur, dr refer respectively to uplink, downlink, left lateral, right lateral, upper left
Movement, bottom-left motion, upper right movement, bottom right movement.
Group movement A2 refers to that intelligent body follows group head to act;Optimal movement A3 refers to that intelligent body follows safe escape and refers to
The long movement of the basic instruction of eight of indicating will, indicates are as follows:
A3=(forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r) shape
State collection S then indicates each step that intelligent body is walked.
Wherein, forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r points
It does not refer to keeping straight on, walks the left side, walk the right, stop, keeping straight on or returning, walk the left side or walking the right, turn left, turn right.
The learning process of motion planning is carried out in conjunction with safe escape Warning Mark and Q-learning algorithm, as follows:
1) initialize Q (s, a) be 0,
2) the status information s of intelligent body observation t momentt;
3) according to current state and reward value rt, the big movement a of intelligent body selection Q valuetIt is moved;
4) when intelligent body is selected acts on environment, environment state changes:
I.e. current location is transformed into next new position st+1, provide reward r immediatelyt, r hereintJust like giving a definition:
5) Q table is updated:Here, it gives
The value of γ is 0.8, judge whether Q value table restrains, if so, stopping circulation, obtains optimal path sequence;Otherwise enter next
Step;
6) receive and summarize input environment information that each intelligent body is sent and its corresponding state, the movement made,
Reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing turn to the
2) step.
Since the present embodiment is the true crowd movement for simulating certain shopping square, crowd is numerous intelligent bodies.Intelligent body
It cannot be individually present, because individual movement does not meet the population characteristic of people, and single intelligent body cannot have in the case where evacuating scene
The completion task of efficiency, the limited scenario resources grasped can make the study course of intelligent body slow, extend optimal path
The time is exported, it is the worst or even be unable to complete goal task.So intelligent body passes through before the progress of intensified learning iteration next time
The environmental information that itself intensified learning obtains is output to general headquarters' message handler, then will be summarized by general headquarters message handlers again
Information is issued to each intelligent body, completes the information sharing between multiple agent in this way, wherein shared information includes plan
Summary, experience, ambient condition.Then each intelligent body is according to the respective resource of the information update obtained from general headquarters' message handler,
Itself Q value, itself history strategy are considered simultaneously, determine the action policy in next iterative process, as shown in Figure 6.
During in specific implementation, information sharing module includes the primary processor and general headquarters' information controller two of intelligent body
Part.The primary processor of intelligent body is used to input environment information, and (intelligent body and barrier and safe escape indicate under such as this state
The content information of the distance of mark and angle, safe escape Warning Mark), output state st, the movement a that makest, encouraged
Encourage rtAnd environmental information, and manage itself gained information;General headquarters' information processing is used to each intelligent body sharing the information of coming
Summarize, be then distributed to each intelligent body again, thus realize information sharing so as to the quick progress of next iteration, as shown in Figure 9.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning
Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark
Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect
Rate reduces the time arrived at the destination.
Embodiment 3
A kind of computer readable storage medium is present embodiments provided, computer program is stored thereon with, which is located
Manage the step realized in the paths planning method combined based on safe escape mark and intensified learning as shown in Figure 1 when device executes
Suddenly.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning
Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark
Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect
Rate reduces the time arrived at the destination.
Embodiment 4
Present embodiments provide a kind of computer equipment, including memory, processor and storage are on a memory and can be
The computer program run on processor, the processor are realized as shown in Figure 1 based on safe escape when executing described program
The step in paths planning method that mark and intensified learning combine.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning
Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark
Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect
Rate reduces the time arrived at the destination.
It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program
Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure
Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random
AccessMemory, RAM) etc..
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field
For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair
Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.