CN109974737A - Route planning method and system based on combination of safety evacuation signs and reinforcement learning - Google Patents

Route planning method and system based on combination of safety evacuation signs and reinforcement learning Download PDF

Info

Publication number
CN109974737A
CN109974737A CN201910289774.3A CN201910289774A CN109974737A CN 109974737 A CN109974737 A CN 109974737A CN 201910289774 A CN201910289774 A CN 201910289774A CN 109974737 A CN109974737 A CN 109974737A
Authority
CN
China
Prior art keywords
intelligent body
safe escape
value table
mark
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910289774.3A
Other languages
Chinese (zh)
Other versions
CN109974737B (en
Inventor
吕蕾
周丽美
赵修凯
吕晨
张桂娟
刘弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Center Information Technology Ltd By Share Ltd
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201910289774.3A priority Critical patent/CN109974737B/en
Publication of CN109974737A publication Critical patent/CN109974737A/en
Priority to LU101606A priority patent/LU101606B1/en
Application granted granted Critical
Publication of CN109974737B publication Critical patent/CN109974737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3446Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags, using precalculated routes

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Alarm Systems (AREA)

Abstract

The disclosure provides a route planning method and system based on combination of safety evacuation signs and reinforcement learning. The path planning method comprises the steps of establishing and rasterizing a two-dimensional simulation scene model, and initializing obstacles, intelligent agents and safety evacuation indication marks in the two-dimensional simulation scene model; the route planning is carried out by combining the safe evacuation indicator and the Q-Learning algorithm, and the specific process is as follows: initializing the Q value corresponding to each agent in the Q value table to be 0; acquiring the state information of each intelligent agent at the current moment, calculating corresponding rewards, and selecting actions with large corresponding Q values to move each intelligent agent; calculating the instant reward of each agent moving to a new position, updating the Q value table, judging whether the Q value table is converged, and if so, obtaining an optimal path sequence; otherwise, receiving and summarizing the input environment information and the corresponding state, the action made, the obtained reward and the output environment information sent by each intelligent agent, distributing the summarized information to each intelligent agent, and continuously moving each intelligent agent.

Description

The paths planning method and system combined based on safe escape mark and intensified learning
Technical field
The disclosure belongs to path planning field more particularly to a kind of road combined based on safe escape mark and intensified learning Diameter method and system for planning.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.
In recent years, with the fast development of Urbanization in China, the building quantity of city public place, scale are not yet It is disconnected to expand, it means that we also continue to increase the safe pressure to be undertaken.How crowd is quickly really simulated in public affairs Evacuation path when contingency occurs for place altogether then becomes our major issues urgently to be resolved.By simulating crowd evacuation road The evacuation process of crowd when diameter can help security department's prediction contingency to occur, and then propose that effective motion planning solves Scheme shortens the evacuating personnel time, reduces the number of casualties.
Inventors have found that the motion planning of comparative maturity has A-star algorithm, artificial potential energy algorithm, cellular certainly at present Motivation, simulated annealing, genetic algorithm, nitrification enhancement etc. complicated environment cannot be rapidly adapted to and Learn and make to timely respond to, the problem for causing path planning low efficiency and accuracy difference occur.
Summary of the invention
To solve the above-mentioned problems, the first aspect of the disclosure provides a kind of based on safe escape mark and intensified learning In conjunction with paths planning method, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just The optimal path in complex environment can be rapidly found out.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of paths planning method combined based on safe escape mark and intensified learning, comprising:
Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier, Intelligent body and safe escape Warning Mark;
Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm;
The detailed process of the step 2 are as follows:
Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table;
Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value big The mobile each intelligent body of movement;
Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge that Q value table is No convergence, if so, obtaining optimal path sequence;Otherwise enter in next step;
Step 2.4: receiving and summarize input environment information that each intelligent body is sent and its corresponding state, make Movement, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, turn To step 2.2.
To solve the above-mentioned problems, the second aspect of the disclosure provides a kind of based on safe escape mark and intensified learning In conjunction with path planning system, safe escape mark and intensified learning are combined, disobeys and is disinclined to environmental model, pass through extensive chemical The trial and error mechanism of habit allows intelligent body constantly to learn to perceive ambient condition, along with the guiding function of safe escape Warning Mark, just The optimal path in complex environment can be rapidly found out.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of path planning system combined based on safe escape mark and intensified learning, comprising:
Two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initialization Barrier, intelligent body and safe escape Warning Mark in two-dimensional simulation model of place;
Path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path planning;
The path planning module, comprising:
Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0;
Intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates corresponding reward, The mobile each intelligent body of the movement for selecting corresponding q value big;
Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, updates Q It is worth table, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence;
Information sharing module is used to receive and summarize the input ring that each intelligent body is sent when Q value table is not restrained Border information and its corresponding state, the movement made, reward obtained and output environment information, then summary information is distributed to Each intelligent body is to realize that information sharing, continuation to update Q value table and judge updated Q according to the mobile each intelligent body of Q value Whether value table restrains.
To solve the above-mentioned problems, a kind of computer readable storage medium is provided in terms of the third of the disclosure, it will peace Full sign for safe evacuation and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, make intelligent body continuous Study perception ambient condition can be rapidly found out in complex environment along with the guiding function of safe escape Warning Mark Optimal path.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor Step in the paths planning method combined based on safe escape mark and intensified learning described above.
To solve the above-mentioned problems, the 4th aspect of the disclosure provides a kind of computer equipment, by safe escape mark Will and intensified learning combine, and disobey and are disinclined to environmental model, by the trial and error mechanism of intensified learning, intelligent body are allowed constantly to learn to perceive Ambient condition can rapidly find out the optimal path in complex environment along with the guiding function of safe escape Warning Mark.
To achieve the goals above, the disclosure adopts the following technical scheme that
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor are realized described above based on safe escape mark and intensified learning knot when executing described program Step in the paths planning method of conjunction.
The beneficial effect of the disclosure is:
(1) disclosure combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.
(2) due to lacking priori knowledge, the path that intensified learning searches out in initial iterative process is frequently not most Excellent, in response to this problem, by the way of multiple agent information sharing, expands environmental information and grasps region, improve search efficiency, Reduce the time arrived at the destination.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of path planning side combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides Method flow chart.
Fig. 2 is the two dimension modeling effect picture that the embodiment of the present disclosure provides.
Fig. 3 is the safe escape Warning Mark position setting schematic diagram that the embodiment of the present disclosure provides.
Fig. 4 is that the combination safe escape Warning Mark that the embodiment of the present disclosure provides and Q-Learning algorithm carry out path rule Draw procedure chart.
Fig. 5 is the intelligent sport environmental interaction procedure chart that the embodiment of the present disclosure provides.
Fig. 6 is the intelligent body information sharing schematic diagram that the embodiment of the present disclosure provides.
Fig. 7 is a kind of path planning system combined based on safe escape mark and intensified learning that the embodiment of the present disclosure provides System structural schematic diagram.
Fig. 8 is the path planning module structural schematic diagram that the embodiment of the present disclosure provides.
Fig. 9 is the information sharing module principle figure that the embodiment of the present disclosure provides.
Specific embodiment
The disclosure is described further with embodiment with reference to the accompanying drawing.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
Embodiment 1
As shown in Figure 1, a kind of paths planning method combined based on safe escape mark and intensified learning of the present embodiment, Include:
Step 1: establish and rasterizing two-dimensional simulation model of place, initialize two-dimensional simulation model of place in barrier, Intelligent body and safe escape Warning Mark.
To improve authenticity, virtual environment is based on certain true shopping plaza contextual data and is established, by virtual environment It is defined as the region of M*N size, then carry out rasterizing processing to it and each grid is numbered.Each grid (xi, yi) indicate, xiIndicate the line number where grid, yiIndicate the columns where grid.Wherein, M and N is positive integer.
In the step 1, barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place are initialized The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the circle of pre-set radius is arranged Region is as collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark, two dimension modeling effect are set Figure is as shown in Figure 2.
The setting rule of safe escape Warning Mark, comprising:
Rule is arranged in the order of safe escape Warning Mark, specific as follows:
↑: indicate straight trip;←: indicate left lateral;→: indicate right lateral;×: indicate that no through traffic;Expression can advance or after It moves back;Expression can left lateral or right lateral;Expression is turned left;Expression is turned right;, and by safe escape Warning Mark and order pair Database should be stored into.
Safe escape Warning Mark position setting rule, specific as follows:
It is anti-in the safe escape Warning Mark that preset quantity is placed by densely populated place area, market entrance and market corner Only stream of people's congestion;
Prevent personnel stranded in the safe escape Warning Mark that remote area places preset quantity;
There are the computer room important place of security risk and regions out-of-bounds to place traffic prohibited sign.
The placement in other regions then needs to meet safety sign setting general rule.
Such as:
The crowd is dense, safe escape Warning Mark is turned in entrance, the straight trip of setting more than corner or left and right, crowd is facilitated to exist It quickly makes a choice, avoids crowded herein;Safe escape Warning Mark is turned in setting straight trip or left and right more than the remote area, to prevent people Member can not flee the scene due to being unfamiliar with path and being stranded;It is arranged there are security risk or not to the specific position of people's opening No through traffic safe escape Warning Mark, in order to avoid the generation of contingency;Scene other everywhere according to real scene situation Rationally setting safe escape Warning Mark, it is desirable that safety sign setting general rule need to be met.Position setting as shown in figure 3, its In, in figure other than the underlying security exit signs direction referred to, also contains the Direction of superposition in basic direction, then do not go to live in the household of one's in-laws on getting married one by one It states.
Wherein, densely populated place area and remote area are the regions for simulating actual scene;Densely populated place area is super for flow of the people p Cross the region of preset flow pt1;Remote area can be preset as flow of the people P less than preset flow pt2 and apart from two-dimensional simulation scene Model boundary is no more than the region of pre-determined distance.Wherein, pt2 is less than pt1.
Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learning algorithm.
Intensified learning mainly passes through intelligent body and continuously attempts in virtual environment, constantly malfunctions, and is returned with environmental feedback Reward value regularized learning algorithm strategy, the cumulative award value for obtaining learning process is maximum, reach the mesh for optimizing the movement of each step, Naturally final outgoing route is exactly optimal path.Wherein, it is positive when intelligent body executes the reward value that certain operating environment is fed back When, it is meant that the trend that this movement is performed will become larger, on the contrary, the execution trend of the movement will become smaller.
When original state, since intelligent bodies know nothing environmental information, need to carry out independent study, each intelligent body Initial actuating selection be all it is random, when combine safe escape Warning Mark complete intensified learning one wheel iteration when, intelligence Can body have certain experience accumulation, then carry out that resource information is shared, then using the resulting information of intelligent body as oneself warp It tests and is learnt, when being encountered in later iterative process with same state in gained information, then may be selected to execute to have most The movement of big reward value, then updates the Q value of itself.
As shown in figure 4, carrying out path rule in conjunction with safe escape Warning Mark and Q-Learning algorithm in the step 2 The detailed process drawn are as follows:
Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table;
Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value big The mobile each intelligent body of movement;
Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge that Q value table is No convergence, if so, obtaining optimal path sequence;Otherwise enter in next step;
Step 2.4: receiving and summarize input environment information that each intelligent body is sent and its corresponding state, make Movement, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, turn To step 2.2.
Wherein, nitrification enhancement is a kind of on-line study method for being different from supervised learning and unsupervised learning.Its benefit Reward is acted and received to interact with environment by state aware, selection with intelligent body, process is as shown in Figure 5.Often walk one Step, intelligent body all can select and execute a movement by environment of observation state, to change its state and be rewarded.Intelligence Body is known as an iteration from the exploration of origin-to-destination each time, means the learning ability one of intelligent body after many times iteration It is secondary to become strong, so finally obtaining as optimal policy.Q-Learning algorithm is as one of nitrification enhancement, and definition is such as Under:
Wherein, in formula []It is the Q value of reality, is denoted as Qreal(st,at+1);
Q (s in formula []t,at) it is the Q value estimated, it is denoted as Qest(st,at+1);γ is the pad value of the following reward, there is 0 <γ<1;α is learning efficiency, there is 0 < α < 1, and to determine current error, how many will be learnt for it;stFor the defeated of t moment Do well information, atFor the movement of t moment made, rtReward, s are obtained by t momentt+1Believe for the output state at t+1 moment Breath, at+1For the movement made at t+1 moment.
Above formula is are as follows:
Qnew(st,at)=Qold(st,at)+α*(Qreal(st,at+1)-Qest(st,at+1))
Wherein, Qold(st,at) indicate old Q value, Qnew(st,at) indicate new Q value.
Safe escape Warning Mark and nitrification enhancement are applied on path planning by the present embodiment, in the process, The behavior aggregate A of intelligent body point is elemental motion A1, group acts A2 and optimal movement A3 three parts, be expressed as A=(A1, A2, A3).Wherein, elemental motion A1 is short movement belonging to eight of each intelligent body, is indicated are as follows: A1=(up, down, left, right,ul,dl,ur,dr);
Wherein: up, down, left, right, ul, dl, ur, dr refer respectively to uplink, downlink, left lateral, right lateral, upper left Movement, bottom-left motion, upper right movement, bottom right movement.
Group movement A2 refers to that intelligent body follows group head to act;Optimal movement A3 refers to that intelligent body follows safe escape and refers to The long movement of the basic instruction of eight of indicating will, indicates are as follows:
A3=(forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r) shape State collection S then indicates each step that intelligent body is walked.
Wherein, forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r points It does not refer to keeping straight on, walks the left side, walk the right, stop, keeping straight on or returning, walk the left side or walking the right, turn left, turn right.
The learning process of motion planning is carried out in conjunction with safe escape Warning Mark and Q-learning algorithm, as follows:
1) initialize Q (s, a) be 0,
2) the status information s of intelligent body observation t momentt
3) according to current state and reward value rt, the big movement a of intelligent body selection Q valuetIt is moved;
4) when intelligent body is selected acts on environment, environment state changes:
I.e. current location is transformed into next new position st+1, provide reward r immediatelyt, r hereintJust like giving a definition:
5) Q table is updated:Here, it gives The value of γ is 0.8, judge whether Q value table restrains, if so, stopping circulation, obtains optimal path sequence;Otherwise enter next Step;
6) receive and summarize input environment information that each intelligent body is sent and its corresponding state, the movement made, Reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing turn to the 2) step.
Since the present embodiment is the true crowd movement for simulating certain shopping square, crowd is numerous intelligent bodies.Intelligent body It cannot be individually present, because individual movement does not meet the population characteristic of people, and single intelligent body cannot have in the case where evacuating scene The completion task of efficiency, the limited scenario resources grasped can make the study course of intelligent body slow, extend optimal path The time is exported, it is the worst or even be unable to complete goal task.So intelligent body passes through before the progress of intensified learning iteration next time The environmental information that itself intensified learning obtains is output to general headquarters' message handler, then will be summarized by general headquarters message handlers again Information is issued to each intelligent body, completes the information sharing between multiple agent in this way, wherein shared information includes plan Summary, experience, ambient condition.Then each intelligent body is according to the respective resource of the information update obtained from general headquarters' message handler, Itself Q value, itself history strategy are considered simultaneously, determine the action policy in next iterative process, as shown in Figure 6.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect Rate reduces the time arrived at the destination.
Embodiment 2
As shown in fig. 7, present embodiments providing a kind of path planning combined based on safe escape mark and intensified learning System characterized by comprising
(1) two-dimensional simulation model of place initialization module is used to establish simultaneously rasterizing two-dimensional simulation model of place, initially Change barrier, intelligent body and the safe escape Warning Mark in two-dimensional simulation model of place.
To improve authenticity, virtual environment is based on certain true shopping plaza contextual data and is established, by virtual environment It is defined as the region of M*N size, then carry out rasterizing processing to it and each grid is numbered.Each grid (xi, yi) indicate, xiIndicate the line number where grid, yiIndicate the columns where grid.Wherein, M and N is positive integer.
In the step 1, barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place are initialized The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the circle of pre-set radius is arranged Region is as collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark, two dimension modeling effect are set Figure is as shown in Figure 2.
The setting rule of safe escape Warning Mark, comprising:
Rule is arranged in the order of safe escape Warning Mark, specific as follows:
↑: indicate straight trip;←: indicate left lateral;→: indicate right lateral;×: indicate that no through traffic;Expression can advance or after It moves back;Expression can left lateral or right lateral;Expression is turned left;Expression is turned right;, and by safe escape Warning Mark and order pair Database should be stored into.
Safe escape Warning Mark position setting rule, specific as follows:
It is anti-in the safe escape Warning Mark that preset quantity is placed by densely populated place area, market entrance and market corner Only stream of people's congestion;
Prevent personnel stranded in the safe escape Warning Mark that remote area places preset quantity;
There are the computer room important place of security risk and regions out-of-bounds to place traffic prohibited sign.
The placement in other regions then needs to meet safety sign setting general rule.
Such as:
The crowd is dense, safe escape Warning Mark is turned in entrance, the straight trip of setting more than corner or left and right, crowd is facilitated to exist It quickly makes a choice, avoids crowded herein;Safe escape Warning Mark is turned in setting straight trip or left and right more than the remote area, to prevent people Member can not flee the scene due to being unfamiliar with path and being stranded;It is arranged there are security risk or not to the specific position of people's opening No through traffic safe escape Warning Mark, in order to avoid the generation of contingency;Scene other everywhere according to real scene situation Rationally setting safe escape Warning Mark, it is desirable that safety sign setting general rule need to be met.Position setting as shown in figure 3, its In, in figure other than the underlying security exit signs direction referred to, also contains the Direction of superposition in basic direction, then do not go to live in the household of one's in-laws on getting married one by one It states.
Wherein, densely populated place area and remote area are the regions for simulating actual scene;Densely populated place area is super for flow of the people p Cross the region of preset flow pt1;Remote area can be preset as flow of the people P less than preset flow pt2 and apart from two-dimensional simulation scene Model boundary is no more than the region of pre-determined distance.Wherein, pt2 is less than pt1.
(2) path planning module is used to that safe escape Warning Mark and Q-Learning algorithm to be combined to carry out path rule It draws.
Intensified learning mainly passes through intelligent body and continuously attempts in virtual environment, constantly malfunctions, and is returned with environmental feedback Reward value regularized learning algorithm strategy, the cumulative award value for obtaining learning process is maximum, reach the mesh for optimizing the movement of each step, Naturally final outgoing route is exactly optimal path.Wherein, it is positive when intelligent body executes the reward value that certain operating environment is fed back When, it is meant that the trend that this movement is performed will become larger, on the contrary, the execution trend of the movement will become smaller.
When original state, since intelligent bodies know nothing environmental information, need to carry out independent study, each intelligent body Initial actuating selection be all it is random, when combine safe escape Warning Mark complete intensified learning one wheel iteration when, intelligence Can body have certain experience accumulation, then carry out that resource information is shared, then using the resulting information of intelligent body as oneself warp It tests and is learnt, when being encountered in later iterative process with same state in gained information, then may be selected to execute to have most The movement of big reward value, then updates the Q value of itself.
As shown in figure 8, the path planning module, comprising:
(2.1) Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0;
(2.2) intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates correspondence Reward, the mobile each intelligent body of the movement for selecting corresponding q value big;
(2.3) Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, Q value table is updated, judges whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence;
(2.4) information sharing module is used to receive when Q value table is not restrained and summarize what each intelligent body was sent Input environment information and its corresponding state, the movement made, reward obtained and output environment information, then by summary information Each intelligent body is distributed to realize information sharing, continue to update Q value table according to the mobile each intelligent body of Q value and judges to update Whether Q value table afterwards restrains.
Wherein, nitrification enhancement is a kind of on-line study method for being different from supervised learning and unsupervised learning.Its benefit Reward is acted and received to interact with environment by state aware, selection with intelligent body, process is as shown in Figure 5.Often walk one Step, intelligent body all can select and execute a movement by environment of observation state, to change its state and be rewarded.Intelligence Body is known as an iteration from the exploration of origin-to-destination each time, means the learning ability one of intelligent body after many times iteration It is secondary to become strong, so finally obtaining as optimal policy.Q-Learning algorithm is as one of nitrification enhancement, and definition is such as Under:
Wherein, in formula []It is the Q value of reality, is denoted as Qreal(st,at+1);
Q (s in formula []t,at) it is the Q value estimated, it is denoted as Qest(st,at+1);γ is the pad value of the following reward, there is 0 <γ<1;α is learning efficiency, there is 0 < α < 1, and to determine current error, how many will be learnt for it;stFor the defeated of t moment Do well information, atFor the movement of t moment made, rtReward, s are obtained by t momentt+1Believe for the output state at t+1 moment Breath, at+1For the movement made at t+1 moment.
Above formula is are as follows:
Qnew(st,at)=Qold(st,at)+α*(Qreal(st,at+1)-Qest(st,at+1))
Wherein, Qold(st,at) indicate old Q value, Qnew(st,at) indicate new Q value.
Safe escape Warning Mark and nitrification enhancement are applied on path planning by the present embodiment, in the process, The behavior aggregate A of intelligent body point is elemental motion A1, group acts A2 and optimal movement A3 three parts, be expressed as A=(A1, A2, A3).Wherein, elemental motion A1 is short movement belonging to eight of each intelligent body, is indicated are as follows: A1=(up, down, left, right,ul,dl,ur,dr);
Wherein: up, down, left, right, ul, dl, ur, dr refer respectively to uplink, downlink, left lateral, right lateral, upper left Movement, bottom-left motion, upper right movement, bottom right movement.
Group movement A2 refers to that intelligent body follows group head to act;Optimal movement A3 refers to that intelligent body follows safe escape and refers to The long movement of the basic instruction of eight of indicating will, indicates are as follows:
A3=(forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r) shape State collection S then indicates each step that intelligent body is walked.
Wherein, forward, go-l, go-r, stop, fwd or dwbk, go-l or go-r, turn-l, turn-r points It does not refer to keeping straight on, walks the left side, walk the right, stop, keeping straight on or returning, walk the left side or walking the right, turn left, turn right.
The learning process of motion planning is carried out in conjunction with safe escape Warning Mark and Q-learning algorithm, as follows:
1) initialize Q (s, a) be 0,
2) the status information s of intelligent body observation t momentt
3) according to current state and reward value rt, the big movement a of intelligent body selection Q valuetIt is moved;
4) when intelligent body is selected acts on environment, environment state changes:
I.e. current location is transformed into next new position st+1, provide reward r immediatelyt, r hereintJust like giving a definition:
5) Q table is updated:Here, it gives The value of γ is 0.8, judge whether Q value table restrains, if so, stopping circulation, obtains optimal path sequence;Otherwise enter next Step;
6) receive and summarize input environment information that each intelligent body is sent and its corresponding state, the movement made, Reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing turn to the 2) step.
Since the present embodiment is the true crowd movement for simulating certain shopping square, crowd is numerous intelligent bodies.Intelligent body It cannot be individually present, because individual movement does not meet the population characteristic of people, and single intelligent body cannot have in the case where evacuating scene The completion task of efficiency, the limited scenario resources grasped can make the study course of intelligent body slow, extend optimal path The time is exported, it is the worst or even be unable to complete goal task.So intelligent body passes through before the progress of intensified learning iteration next time The environmental information that itself intensified learning obtains is output to general headquarters' message handler, then will be summarized by general headquarters message handlers again Information is issued to each intelligent body, completes the information sharing between multiple agent in this way, wherein shared information includes plan Summary, experience, ambient condition.Then each intelligent body is according to the respective resource of the information update obtained from general headquarters' message handler, Itself Q value, itself history strategy are considered simultaneously, determine the action policy in next iterative process, as shown in Figure 6.
During in specific implementation, information sharing module includes the primary processor and general headquarters' information controller two of intelligent body Part.The primary processor of intelligent body is used to input environment information, and (intelligent body and barrier and safe escape indicate under such as this state The content information of the distance of mark and angle, safe escape Warning Mark), output state st, the movement a that makest, encouraged Encourage rtAnd environmental information, and manage itself gained information;General headquarters' information processing is used to each intelligent body sharing the information of coming Summarize, be then distributed to each intelligent body again, thus realize information sharing so as to the quick progress of next iteration, as shown in Figure 9.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect Rate reduces the time arrived at the destination.
Embodiment 3
A kind of computer readable storage medium is present embodiments provided, computer program is stored thereon with, which is located Manage the step realized in the paths planning method combined based on safe escape mark and intensified learning as shown in Figure 1 when device executes Suddenly.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect Rate reduces the time arrived at the destination.
Embodiment 4
Present embodiments provide a kind of computer equipment, including memory, processor and storage are on a memory and can be The computer program run on processor, the processor are realized as shown in Figure 1 based on safe escape when executing described program The step in paths planning method that mark and intensified learning combine.
The present embodiment combines safe escape mark and intensified learning, disobeys and is disinclined to environmental model, passes through intensified learning Trial and error mechanism, allowing intelligent body constantly to learn perception ambient condition can along with the guiding function of safe escape Warning Mark Rapidly find out the optimal path in complex environment.
The present embodiment expands environmental information and grasps region also by the way of multiple agent information sharing, improves search effect Rate reduces the time arrived at the destination.
It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.

Claims (10)

1. a kind of paths planning method combined based on safe escape mark and intensified learning characterized by comprising
Step 1: establishing simultaneously rasterizing two-dimensional simulation model of place, barrier, intelligence in initialization two-dimensional simulation model of place Body and safe escape Warning Mark;
Step 2: carrying out path planning in conjunction with safe escape Warning Mark and Q-Learn i ng algorithm;
The detailed process of the step 2 are as follows:
Step 2.1: the corresponding Q value of each intelligent body is 0 in initialization Q value table;
Step 2.2: obtaining the status information of current time each intelligent body and calculate corresponding reward, select corresponding q value is big to move Make mobile each intelligent body;
Step 2.3: calculating the instant reward for being moved to each intelligent body of new position, update Q value table, judge whether Q value table is received It holds back, if so, obtaining optimal path sequence;Otherwise enter in next step;
Step 2.4: receive and summarize input environment information that each intelligent body is sent and its corresponding state, make it is dynamic Work, reward obtained and output environment information, then summary information is distributed to each intelligent body to realize information sharing, it turns to Step 2.2.
2. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 2.3, the instant reward for being moved to each intelligent body of new position is set as rt
3. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 1, the process of rasterizing two-dimensional simulation model of place are as follows:
Two-dimensional simulation model of place is defined as to the region of M*N size, rasterizing processing is then carried out to it and to each grid It is numbered, wherein M and N is positive integer.
4. a kind of paths planning method combined based on safe escape mark and intensified learning as described in claim 1, special Sign is, in the step 1, initializes barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the border circular areas of pre-set radius is arranged As collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark are set.
5. a kind of path planning system combined based on safe escape mark and intensified learning characterized by comprising
Two-dimensional simulation model of place initialization module, is used to establish and rasterizing two-dimensional simulation model of place, initialization are two-dimentional Barrier, intelligent body and safe escape Warning Mark in simulating scenes model;
Path planning module is used to that safe escape Warning Mark and Q-Learn i ng algorithm to be combined to carry out path planning;
The path planning module, comprising:
Q value table initialization module, being used to initialize the corresponding Q value of each intelligent body in Q value table is 0;
Intelligent body mobile module is used to obtain the status information of current time each intelligent body and calculates corresponding reward, selection The mobile each intelligent body of the big movement of corresponding q value;
Q value table restrains judgment module, is used to calculate the instant reward for each intelligent body for being moved to new position, updates Q value table, Judge whether Q value table restrains, when the convergence of Q value table, obtains optimal path sequence;
Information sharing module is used to receive and summarize when Q value table is not restrained the input environment letter that each intelligent body is sent Breath and its corresponding state, the movement made, reward obtained and output environment information, then summary information is distributed to each Intelligent body is to realize that information sharing, continuation to update Q value table and judge updated Q value table according to the mobile each intelligent body of Q value Whether restrain.
6. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in Q value table convergence judgment module, the instant reward for being moved to each intelligent body of new position is set as rt
7. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in the Q value table, the process of rasterizing two-dimensional simulation model of place are as follows:
Two-dimensional simulation model of place is defined as to the region of M*N size, rasterizing processing is then carried out to it and to each grid It is numbered, wherein M and N is positive integer.
8. a kind of path planning system combined based on safe escape mark and intensified learning as claimed in claim 5, special Sign is, in the Q value table, initializes barrier, intelligent body and the safe escape indicateing arm in two-dimensional simulation model of place The process of will, comprising:
Defining intelligent body is the particle without volume that has quality, and using intelligent body as the center of circle, the border circular areas of pre-set radius is arranged As collision detection region;
It places obstacles object number, position and shared area size;
Number, position, shared area size and the instruction content of safe escape Warning Mark are set.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor It is realized when row such as the path planning side of any of claims 1-4 combined based on safe escape mark and intensified learning Step in method.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes such as base of any of claims 1-4 when executing described program Step in the paths planning method that safe escape mark and intensified learning combine.
CN201910289774.3A 2019-04-11 2019-04-11 Route planning method and system based on combination of safety evacuation signs and reinforcement learning Active CN109974737B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910289774.3A CN109974737B (en) 2019-04-11 2019-04-11 Route planning method and system based on combination of safety evacuation signs and reinforcement learning
LU101606A LU101606B1 (en) 2019-04-11 2020-01-27 Path planning method and system based on combination of safety evacuation signs and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910289774.3A CN109974737B (en) 2019-04-11 2019-04-11 Route planning method and system based on combination of safety evacuation signs and reinforcement learning

Publications (2)

Publication Number Publication Date
CN109974737A true CN109974737A (en) 2019-07-05
CN109974737B CN109974737B (en) 2020-01-31

Family

ID=67084173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910289774.3A Active CN109974737B (en) 2019-04-11 2019-04-11 Route planning method and system based on combination of safety evacuation signs and reinforcement learning

Country Status (2)

Country Link
CN (1) CN109974737B (en)
LU (1) LU101606B1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN111026272A (en) * 2019-12-09 2020-04-17 网易(杭州)网络有限公司 Training method and device for virtual object behavior strategy, electronic equipment and storage medium
CN111353260A (en) * 2020-03-08 2020-06-30 苏州浪潮智能科技有限公司 Computational grid parallel region division method and device based on reinforcement learning
CN111523731A (en) * 2020-04-24 2020-08-11 山东师范大学 Crowd evacuation movement path planning method and system based on Actor-Critic algorithm
CN112215328A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Training of intelligent agent, and action control method and device based on intelligent agent
CN112327890A (en) * 2020-11-10 2021-02-05 中国海洋大学 Underwater multi-robot path planning based on WHCA algorithm
CN112558601A (en) * 2020-11-09 2021-03-26 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN113050641A (en) * 2021-03-18 2021-06-29 香港中文大学(深圳) Path planning method and related equipment
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning
CN113639755A (en) * 2021-08-20 2021-11-12 江苏科技大学苏州理工学院 Fire scene escape-rescue combined system based on deep reinforcement learning
CN113946428A (en) * 2021-11-02 2022-01-18 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113218400B (en) * 2021-05-17 2022-04-19 太原科技大学 Multi-agent navigation algorithm based on deep reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170105163A1 (en) * 2015-10-13 2017-04-13 The Board Of Trustees Of The University Of Alabama Artificial intelligence-augmented, ripple-diamond-chain shaped rateless routing in wireless mesh networks with multi-beam directional antennas
CN107403049A (en) * 2017-07-31 2017-11-28 山东师范大学 A kind of Q Learning pedestrians evacuation emulation method and system based on artificial neural network
CN107464021A (en) * 2017-08-07 2017-12-12 山东师范大学 A kind of crowd evacuation emulation method based on intensified learning, device
CN109086550A (en) * 2018-08-27 2018-12-25 山东师范大学 The evacuation emulation method and system of Q study are shared based on multi-Agent
CN109101694A (en) * 2018-07-16 2018-12-28 山东师范大学 A kind of the crowd behaviour emulation mode and system of the guidance of safe escape mark
CN109214065A (en) * 2018-08-14 2019-01-15 山东师范大学 The crowd evacuation emulation method and system of Q table are shared based on multi-Agent
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN109543285A (en) * 2018-11-20 2019-03-29 山东师范大学 A kind of crowd evacuation emulation method and system of fused data driving and intensified learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170105163A1 (en) * 2015-10-13 2017-04-13 The Board Of Trustees Of The University Of Alabama Artificial intelligence-augmented, ripple-diamond-chain shaped rateless routing in wireless mesh networks with multi-beam directional antennas
CN107403049A (en) * 2017-07-31 2017-11-28 山东师范大学 A kind of Q Learning pedestrians evacuation emulation method and system based on artificial neural network
CN107464021A (en) * 2017-08-07 2017-12-12 山东师范大学 A kind of crowd evacuation emulation method based on intensified learning, device
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN109101694A (en) * 2018-07-16 2018-12-28 山东师范大学 A kind of the crowd behaviour emulation mode and system of the guidance of safe escape mark
CN109214065A (en) * 2018-08-14 2019-01-15 山东师范大学 The crowd evacuation emulation method and system of Q table are shared based on multi-Agent
CN109086550A (en) * 2018-08-27 2018-12-25 山东师范大学 The evacuation emulation method and system of Q study are shared based on multi-Agent
CN109543285A (en) * 2018-11-20 2019-03-29 山东师范大学 A kind of crowd evacuation emulation method and system of fused data driving and intensified learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YANBIN HAN 等,: ""Extended route choice model based on available evacuation route set and its application in crowd evacuation simulation"", 《SIMULATION MODELLING PRACTICE AND THEORY》 *
张鹏 等,: ""基于人工蜂群算法的疏散运动仿真"", 《计算机工程》 *
童亮 等,: ""强化学习在机器人路径规划中的应用研究"", 《计算机仿真》 *
马乃琦 等,: ""复杂场景下面向群体路径规划的改进人工蜂群算法"", 《山东师范大学学报(自然科学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673637B (en) * 2019-10-08 2022-05-13 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN110726416A (en) * 2019-10-23 2020-01-24 西安工程大学 Reinforced learning path planning method based on obstacle area expansion strategy
CN111026272A (en) * 2019-12-09 2020-04-17 网易(杭州)网络有限公司 Training method and device for virtual object behavior strategy, electronic equipment and storage medium
CN111026272B (en) * 2019-12-09 2023-10-31 网易(杭州)网络有限公司 Training method and device for virtual object behavior strategy, electronic equipment and storage medium
CN111353260A (en) * 2020-03-08 2020-06-30 苏州浪潮智能科技有限公司 Computational grid parallel region division method and device based on reinforcement learning
CN111353260B (en) * 2020-03-08 2023-01-10 苏州浪潮智能科技有限公司 Computational grid parallel region division method and device based on reinforcement learning
CN111523731A (en) * 2020-04-24 2020-08-11 山东师范大学 Crowd evacuation movement path planning method and system based on Actor-Critic algorithm
CN112215328A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Training of intelligent agent, and action control method and device based on intelligent agent
CN112215328B (en) * 2020-10-29 2024-04-05 腾讯科技(深圳)有限公司 Training of intelligent agent, action control method and device based on intelligent agent
CN112558601A (en) * 2020-11-09 2021-03-26 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN112558601B (en) * 2020-11-09 2024-04-02 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN112327890A (en) * 2020-11-10 2021-02-05 中国海洋大学 Underwater multi-robot path planning based on WHCA algorithm
CN113050641A (en) * 2021-03-18 2021-06-29 香港中文大学(深圳) Path planning method and related equipment
CN113050641B (en) * 2021-03-18 2023-02-28 香港中文大学(深圳) Path planning method and related equipment
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning
CN113639755A (en) * 2021-08-20 2021-11-12 江苏科技大学苏州理工学院 Fire scene escape-rescue combined system based on deep reinforcement learning
CN113946428A (en) * 2021-11-02 2022-01-18 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium
CN113946428B (en) * 2021-11-02 2024-06-07 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109974737B (en) 2020-01-31
LU101606B1 (en) 2020-05-27

Similar Documents

Publication Publication Date Title
CN109974737A (en) Route planning method and system based on combination of safety evacuation signs and reinforcement learning
Liu et al. A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm
Liu et al. Crowd evacuation simulation approach based on navigation knowledge and two-layer control mechanism
Li et al. A review of cellular automata models for crowd evacuation
Lima et al. A cellular automata ant memory model of foraging in a swarm of robots
Yuksel Agent-based evacuation modeling with multiple exits using NeuroEvolution of Augmenting Topologies
Izquierdo et al. Forecasting pedestrian evacuation times by using swarm intelligence
Rodriguez et al. Behavior-based evacuation planning
CN105808852A (en) Indoor pedestrian microscopic simulation method based on cellular automaton
CN110795833B (en) Crowd evacuation simulation method, system, medium and equipment based on cat swarm algorithm
US9058570B2 (en) Device and method for generating a targeted realistic motion of particles along shortest paths with respect to arbitrary distance weightings for simulations of flows of people and objects
CN105701314A (en) Complex crowd evacuation behavior simulation method based on self-adaption intelligent agent model
US10769855B2 (en) Personnel movement simulation and control
Mathew et al. Urban walkability design using virtual population simulation
CN106650915A (en) Crowd behavior simulation method based on grid agent
Hidaka et al. Generating pedestrian walking behavior considering detour and pause in the path under space-time constraints
Ni et al. An evacuation model for passenger ships that includes the influence of obstacles in cabins
Zhang et al. Knowledge-based crowd motion for the unfamiliar environment
Bao Room evacuation in the presence of obstacles using an agent-based model with turning behavior
Wang et al. Path optimization for mass emergency evacuation based on an integrated model
Duan et al. Crowd evacuation under real data: a crowd congestion control method based on sensors and knowledge graph
Zhu et al. Computer application in game map path-finding based on fuzzy logic dynamic hierarchical ant colony algorithm
Wang et al. Risk‐Field Based Modeling for Pedestrian Emergency Evacuation Combined with Alternative Route Strategy
Ünal et al. Generating emergency evacuation route directions based on crowd simulations with reinforcement learning
Rodriguez et al. Utilizing roadmaps in evacuation planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220426

Address after: 250014 No. 19, ASTRI Road, Lixia District, Shandong, Ji'nan

Patentee after: Shandong center information technology Limited by Share Ltd.

Address before: 250014 No. 88, Wenhua East Road, Lixia District, Shandong, Ji'nan

Patentee before: SHANDONG NORMAL University

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Path planning method and system based on the combination of safety evacuation signs and reinforcement learning

Effective date of registration: 20230301

Granted publication date: 20200131

Pledgee: Bank of Beijing Co.,Ltd. Jinan Branch

Pledgor: Shandong center information technology Limited by Share Ltd.

Registration number: Y2023370000045