CN106843225A - A kind of Intelligent Mobile Robot path planning system - Google Patents

A kind of Intelligent Mobile Robot path planning system Download PDF

Info

Publication number
CN106843225A
CN106843225A CN201710153238.1A CN201710153238A CN106843225A CN 106843225 A CN106843225 A CN 106843225A CN 201710153238 A CN201710153238 A CN 201710153238A CN 106843225 A CN106843225 A CN 106843225A
Authority
CN
China
Prior art keywords
action
information
intensity
robot
control module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710153238.1A
Other languages
Chinese (zh)
Other versions
CN106843225B (en
Inventor
蔡乐才
吴昊霖
高祥
居锦武
陈冬君
刘鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yibin University
Original Assignee
Yibin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yibin University filed Critical Yibin University
Priority to CN201710153238.1A priority Critical patent/CN106843225B/en
Priority claimed from CN201710153238.1A external-priority patent/CN106843225B/en
Publication of CN106843225A publication Critical patent/CN106843225A/en
Application granted granted Critical
Publication of CN106843225B publication Critical patent/CN106843225B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a kind of Intelligent Mobile Robot path planning system, based on information strength guide and elicitation formula Q study, including middle control module, distance sensor module, RFID module and motion-control module, distance sensor module is made up of 7 range sensors, and the avoidance of crusing robot is used for for sending measured range data to middle control module;RFID module is made up of the rfid interrogator in the RFID label tag and crusing robot of fixed point distribution, and for RFID landmark datas and objective position data to be sent to, middle control module is used for the location position of crusing robot and target location determines;Motion-control module receives the order from middle control module and determines the direction of motion;Middle control module is the Agent of crusing robot.The present invention completes to carry out special patrol task to emphasis designated equipment under the conditions of special weather etc. using the path planning system of intensified learning, it is to avoid the railway maintenance work of the paths planning method such as magnetic orbital.

Description

A kind of Intelligent Mobile Robot path planning system
Technical field
The present invention relates to crusing robot navigation field, and in particular to a kind of Intelligent Mobile Robot path planning system System.
Background technology
In power system, the most basic feature of electric energy is to store on a large scale, and the production of electric energy, conveying, point With, using being all continuous.Whole power system realizes that networking is interconnected, and introduces the mechanism of the marketization, brings huge Golden eggs, but the safe and stable operation of simultaneity factor faces huge challenge.Electricity substation system is whole power system One of middle production, three big core systems of conveying and distribution, the safety to whole power system plays an important role.It is right at present The routine inspection mode of transformer station mainly has manual inspection and robot to patrol and examine.Intelligent inspection robot mainly by remote control or Autonomous control mode, patrol and detection is carried out to outside transformer substation equipment, can replace manually carrying out some repetition, numerous and diverse, high-risks Property is patrolled and examined, and can complete the patrol task for more accurately routinizing.
Intelligent Mobile Robot is the one kind in mobile robot.The external research for mobile robot, not only rises Step is relatively early, and develops also very fast.It is domestic later to the research time started of mobile robot relative to foreign countries, apart from the world Cutting edge technology level is also relatively far away from.But what the country was accelerating mobile robot probes into paces.In national " 863 Program " projects Support under, the research institution such as Tsing-Hua University, Harbin Institute of Technology, Chinese Academy of Sciences starts to intelligent mobile robot Research, and obtain certain achievement.Research of the China to robot used for intelligent substation patrol starts from PSI in 2002, receives state The support of family " 863 " plan.In October, 2005, China's First Mobile Robot for Substation Equipment Inspection puts into operation in Changqing, it It is by the independent research of the Shandong Electric Power Group academy of sciences.2 months 2012, Chinese First rail mounted crusing robot put into trial operation, This indicates that Chinese transformer station's hypostazation robot is in very fast development, in development autonomous mobile robot technical merit Meanwhile, also effectively improve the intelligent level of network system.Current crusing robot China be used widely and incite somebody to action Lasting application is obtained in engineering is patrolled and examined in national grid intellectuality from now on.By the end of 2014, at least 27 provinces in the whole nation, city, Autonomous region, municipality directly under the Central Government employ Intelligent Mobile Robot and are patrolled and examined, and cover south electric network, North China Power Telecommunication Network, East China Power Grid And Northwest Grid.Thus, it is necessary to improvement or perfect functionally is carried out to Intelligent Mobile Robot.
Intelligent Mobile Robot routine inspection mode can be divided into normal patrolling and examining and patrol and examine operation with special.Normally operation is patrolled and examined to become Power station crusing robot makes an inspection tour whole substation equipments;It is special to patrol and examine operation i.e. under special circumstances to transformer station that some are specified Equipment is maked an inspection tour, and refers generally to be put into operation in hot weather, large load operation, new equipment and the severe ring such as hail, thunder and lightning Under border, special patrolling and examining is carried out to the special equipment of transformer station.When Intelligent Mobile Robot carries out special patrolling and examining, according to current The crusing robots such as common magnetic orbital do not have flexibility then.The Intelligent Mobile Robot path planning essence of Behavior-based control It is exactly mapping that the ambient condition of sensor senses is acted to actuator.Can be to external world using the crusing robot of this technology Environmental change is responded, with real-time, quick advantage.Therefore the quality of path planning performance will directly affect inspection machine People patrols and examines the efficiency of work.Intensified learning is one of machine learning important branch, is more and more closed again in recent years Note, also obtains practical application extensive and complicated all the more.It interacts to complete study by way of trial and error with environment.Such as Fruit environment is the positive then selection action intensifying trend to its action evaluation, will otherwise be weakened.Agent is in continuous training During obtain optimal policy.Therefore intensified learning has the characteristics of autonomous learning and on-line study, and machine is can be used for by training In device people's path planning, also it has been widely used in the middle of the path planning problem of mobile robot at present.
Although intensified learning have many advantages, such as and Worth Expecting application prospect, intensified learning there is also restrains Speed is slow, " dimension disaster ", balance explore with utilize, time brief inference the problems such as.The reason for intensified learning convergence rate is slow One of be no teacher signal, can only be by exploring and gradually being improved by environmental evaluation to obtain optimal action policy.To enter One step accelerates intensified learning convergence rate, and heuristic intensified learning injects certain priori by intensified learning, effectively Improve the convergence rate of intensified learning.Torrey etc. is that nitrification enhancement injects priori to improve receipts by transfer learning Hold back speed;But the priori that transfer learning is injected is fixed, even if there is the unreasonable rule also cannot be in training process Middle on-line amending.Bianchi etc. adds heuristic function by traditional nitrification enhancement, and value is used in combination in the training process Function and heuristic function select action, it is proposed that heuristic intensified learning (Heuristically Accelerated Reinforcement Learning, HARL) algorithm model.The heuristic most important feature of intensified learning is that online updating is inspired Function, with the continuous heuristic function for strengthening the action for performing better than.The quick grade in side is proposed on the basis of heuristic nitrification enhancement A kind of heuristic intensified learning method based on state backtracking, the importance of repetitive operation is described by introducing cost function, is tied Conjunction action award and action cost propose a kind of new heuristic function definition further to improve convergence rate;But the method is only The importance for being directed to repetitive is estimated.
The content of the invention
To solve the above problems, the invention provides a kind of Intelligent Mobile Robot path planning system.
To achieve the above object, the technical scheme taken of the present invention is:
A kind of Intelligent Mobile Robot path planning system, based on information strength guide and elicitation formula Q study, including middle control Module, distance sensor module, RFID module and motion-control module, the distance sensor module is by 7 range sensors Composition, the avoidance of crusing robot is used for for sending measured range data to middle control module;RFID module is by pinpointing Rfid interrogator composition in the RFID label tag and crusing robot of distribution, for by RFID landmark datas and objective position Data send middle control module to is used for location position and the target location determination of crusing robot;Motion-control module receives to come from The order of middle control module determines the direction of motion;Middle control module is the Agent of crusing robot, for receiving other modules and spreading out of Data determine action strategy, and to motion-control module transmission order with path planning.
Wherein, with crusing robot front for zero degree line, seven range sensors successively with -90 °, -60 °, -30 °, 0 °, 30 °, 60 °, 90 ° is assemblied in crusing robot side.
Wherein, the foundation of crusing robot rewards and punishments mechanism is completed by following steps:
Step 1:The mobile rewards and punishments mechanism of setting:To encourage robot to move to impact point with step number as few as possible, every time Performing an action can all produce a punishment return value;Simultaneously to encourage robot to judge in advance, in inessential situation Lower to avoid wide-angle from moving as far as possible, the punishment return value of wide-angle movement is some larger.It is specifically configured to:Belong in action- 30 °, 0 °, 30 ° } when, punishment return value is -0.2;When action belongs to { -60 °, 60 ° }, punishment return value is -0.5;
Step 2:Sets target place rewards and punishments mechanism:The position of crusing robot and target device is demarcated using RFID; After the action of crusing robot each step, calculate the distance between current location and objective d, by-d (even calculated away from Negated from value) as target return value now;Meanwhile, the return value that will be moved into objective is set to+100;
Step 3:Crusing robot avoidance return value is set:Using two-stage avoidance return value grade:When seven Distance-sensings When device has any one measurement result to be less than 0.1 meter, assert that robot has bumped against barrier (including equipment and wall etc.), this When punishment return value be -100, and using this as final state exit current episode into next episode study; It is to encourage machine when seven range sensors have any one measurement result more than 0.1 and during less than half robot car height Device people avoidance early, the punishment return value for setting now is -2.
Wherein, the middle control module is based on the planning that following steps complete crusing robot path:
Step 1:Initialization Agent
Init state-action value function, heuristic function;Determine target device position and patrol and examine position;
Step 2:Design table H record information intensity
Table H is defined as four-tuple<si,ai,p(si,ai),fmax>;Wherein, siTo need the information shape of fresh information intensity State;aiTo need the information action of fresh information intensity;p(si,ai) it is information strength after updating, information strength is and adaptation The proportional scalar of degree;fmaxIt is the information state s for recording before thisiFitness maximum;
Step 3:More new state-action value function
The renewal rule of Q learning states-action value function is as follows:
Step 4:Update fitness maximum
Fitness value is defined as the folding that Agent during every act (episode) is trained moves to dbjective state from original state The accumulative return of button;Its definition mode isWherein, β is fitness discount factor, and by Agent, movement is obtained R every time Return;When the maximum adaptation during Agent completes one act of obtained fitness value of training more than table H is spent, then adapted to Spend the renewal of maximum;
Step 5:Fresh information intensity
If fitness maximum updates, correspondingly fresh information intensity, information strength p (si,ai) renewal rule such as Under:
Wherein, atIn state s in the study of the expression newest plots of AgentiThe action of use, aiRepresent that the information in table H is moved Make, fmaxRepresent the fitness maximum in table H;
Step 6:It is determined that the heuristic function based on information strength
To make obtained information strength size be reflected directly in Action Selection, information strength is dissolved into inspiration letter Number;By setting influence magnitude parameter come control information intensity to the influence degree of Action Selection;Heuristic function update mode is determined Justice is as follows:
Wherein, πp(st) it is the optimal action under information strength inspiration;It is strong by maximum information The importance of the action that degree is represented with information strength summation proportion, is designated as h;U is influence of the information strength to Action Selection Magnitude parameter, the influence of the more big then information strengths of U is bigger;
Updated more than in rule, only the heuristic function of optimal action is updated under information strength inspiration, acts on The selection of action policy, the heuristic function of optimal action is all set as 0 under non-information element intensity inspiration;When pheromones intensity is inspired Under the value function of optimal action when being less than another action, by being superimposed heuristic function, Action Selection is more prone to pheromones strong The larger action of degree, rather than the action that the selection value function in the case of not exclusively exploring is larger;
Step 7:Determine strategy under heuristic function and value function effect
The Action Selection strategy of the heuristic Q study of information strength guiding uses Boltzmann mechanism, its update mode rule It is then as follows:
When using Boltzmann mechanism, if the action under current maximum actuation value function be not under pheromones intensity most Excellent action, then by Q (st,a)+H(st, a), increase the select probability of optimal action under pheromones intensity;Use simultaneously Boltzmann mechanism, different action messages element intensity difference away from less in the case of so that the action under maximum actuation value function Probability with optimal action under pheromones intensity is close, so as to the local optimum for avoiding being absorbed under pheromones intensity;In pheromones Intensity difference is in the case of larger so that Action Selection probability is partial to optimal action under pheromones intensity, so as to help to calculate Method restrains.
The invention has the advantages that:
Emphasis designated equipment is carried out under the conditions of path planning system completion special weather using intensified learning etc. special Patrol task, it is to avoid the railway maintenance work of the paths planning method such as magnetic orbital;Propose can online updating information strength guiding Heuristic Q learning algorithms, the algorithm introduced on the basis of heuristic nitrification enhancement according to every time training return carry out The information strength of online updating, is determined by combining the different action message intensity of degree of strength and state-action value function Strategy, so as to improve algorithm the convergence speed.
Brief description of the drawings
Fig. 1 is a kind of system block diagram of Intelligent Mobile Robot path planning system of the embodiment of the present invention.
Fig. 2 is 7 scheme of installations of range sensor in the embodiment of the present invention.
Fig. 3 is the flow chart in middle control module planning path in the embodiment of the present invention.
Fig. 4 is substation simulation lab diagram in the embodiment of the present invention.
Fig. 5 be the embodiment of the present invention in be accumulated into power results figure.
Fig. 6 is the average step number result figure of algorithm in the embodiment of the present invention.
Fig. 7 is algorithm average cumulative Result figure in the embodiment of the present invention.
Specific embodiment
In order that objects and advantages of the present invention become more apparent, the present invention is carried out further with reference to embodiments Describe in detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to limit this hair It is bright.
As shown in figure 1, a kind of Intelligent Mobile Robot path planning system is the embodiment of the invention provides, based on information Intensity guide and elicitation formula Q learns, including middle control module, distance sensor module, RFID module and motion-control module, it is described away from It is made up of 7 range sensors from sensor assembly, is used to patrol and examine for sending measured range data to middle control module The avoidance of robot;RFID module is made up of the rfid interrogator in the RFID label tag and crusing robot of fixed point distribution, is used for Sending RFID landmark datas and objective position data to middle control module is used for the location position and target of crusing robot Position determines;Motion-control module receives the order from middle control module and determines the direction of motion;Middle control module is crusing robot Agent, spread out of the data come for receiving other modules and determine action strategy, and transmit order to advise to motion-control module Draw path.
As shown in Fig. 2 with crusing robot front for zero degree line, seven range sensors successively with -90 °, -60 °, - 30 °, 0 °, 30 °, 60 °, 90 ° are assemblied in crusing robot side, correspondingly, robot motion's pattern be set as to -60 °, - 30 °, 0 °, 30 °, 60 ° of direction movements.
Wherein, the foundation of crusing robot rewards and punishments mechanism is completed by following steps:
Step 1:The mobile rewards and punishments mechanism of setting:To encourage robot to move to impact point with step number as few as possible, every time Performing an action can all produce a punishment return value;Simultaneously to encourage robot to judge in advance, in inessential situation Lower to avoid wide-angle from moving as far as possible, the punishment return value of wide-angle movement is some larger.It is specifically configured to:Belong in action- 30 °, 0 °, 30 ° } when, punishment return value is -0.2;When action belongs to { -60 °, 60 ° }, punishment return value is -0.5;
Step 2:Sets target place rewards and punishments mechanism:The position of crusing robot and target device is demarcated using RFID; After the action of crusing robot each step, calculate the distance between current location and objective d, by-d (even calculated away from Negated from value) as target return value now;Meanwhile, the return value that will be moved into objective is set to+100;
Step 3:Crusing robot avoidance return value is set:Using two-stage avoidance return value grade:When seven Distance-sensings When device has any one measurement result to be less than 0.1 meter, assert that robot has bumped against barrier (including equipment and wall etc.), this When punishment return value be -100, and using this as final state exit current episode into next episode study; It is to encourage machine when seven range sensors have any one measurement result more than 0.1 and during less than half robot car height Device people avoidance early, the punishment return value for setting now is -2.
As shown in figure 3, the middle control module is based on the planning that following steps complete crusing robot path:
Step 1:Initialization Agent
Init state-action value function, heuristic function;Determine target device position and patrol and examine position;
Step 2:Design table H record information intensity
Table H is defined as four-tuple<si,ai,p(si,ai),fmax>;Wherein, siTo need the information shape of fresh information intensity State;aiTo need the information action of fresh information intensity;p(si,ai) it is information strength after updating, information strength is and adaptation The proportional scalar of degree;fmaxIt is the information state s for recording before thisiFitness maximum;
Step 3:More new state-action value function
The renewal rule of Q learning states-action value function is as follows:
Step 4:Update fitness maximum
Fitness value is defined as the folding that Agent during every act (episode) is trained moves to dbjective state from original state The accumulative return of button;Its definition mode isWherein, β is fitness discount factor, and by Agent, movement is obtained R every time Return;When the maximum adaptation during Agent completes one act of obtained fitness value of training more than table H is spent, then adapted to Spend the renewal of maximum;
Step 5:Fresh information intensity
If fitness maximum updates, correspondingly fresh information intensity, information strength p (si,ai) renewal rule such as Under:
Wherein, atIn state s in the study of the expression newest plots of AgentiThe action of use, aiRepresent that the information in table H is moved Make, fmaxRepresent the fitness maximum in table H;
Rule is updated more than, makes information strength p (si,ai) by fitness maximum f in fitness f and table HmaxDifference Extent value is determined;The f stored in f is more than table HmaxWhen, information strength then needs to update, i.e. table H needs to update;Based on upper Renewal rule is stated, the algorithm makes the information strength updated according to fitness difference degree while information strength before this is retained Embody the importance of different information actions;
Assuming that ai∈{a1,a2…aN, and a is performed in the training processmWhen obtain maximum adaptation degree f1, fitness in table H Maximum is before the update fmax=f0;It is as follows then result to be updated according to above formula:(I) if ai=am, then p (si,am)=1;(II) if ai≠am:(1) as p (si,amDuring)=0, the p (s after renewali,am) still it is 0;(2) as p (si,amDuring)=1, after renewal(3) whenWhen, after renewal
Step 6:It is determined that the heuristic function based on information strength
To make obtained information strength size be reflected directly in Action Selection, information strength is dissolved into inspiration letter Number;By setting influence magnitude parameter come control information intensity to the influence degree of Action Selection;Heuristic function update mode is determined Justice is as follows:
Wherein, πp(st) it is the optimal action under information strength inspiration;It is strong by maximum information The importance of the action that degree is represented with information strength summation proportion, is designated as h;U is influence of the information strength to Action Selection Magnitude parameter, the influence of the more big then information strengths of U is bigger;
Updated more than in rule, only the heuristic function of optimal action is updated under information strength inspiration, acts on The selection of action policy, the heuristic function of optimal action is all set as 0 under non-information element intensity inspiration.When pheromones intensity is inspired Under the value function of optimal action when being less than another action, by being superimposed heuristic function, Action Selection is more prone to pheromones strong The larger action of degree, rather than the action that the selection value function in the case of not exclusively exploring is larger.Note, as shown in above formula, open Number of sending a letter not is to directly act on action value function, action value function is changed;But operation is overlapped, will be superimposed Function is used to determine Action Selection strategy that the then return of this plot study to act on the renewal of action value function.
Step 7:Determine strategy under heuristic function and value function effect
The Action Selection strategy of the heuristic Q study of information strength guiding uses Boltzmann mechanism, its update mode rule It is then as follows:
When using Boltzmann mechanism, if the action under current maximum actuation value function be not under pheromones intensity most Excellent action, then by Q (st,a)+H(st, a), increase the select probability of optimal action under pheromones intensity;Use simultaneously Boltzmann mechanism, different action messages element intensity difference away from less in the case of so that the action under maximum actuation value function Probability with optimal action under pheromones intensity is close, so as to the local optimum for avoiding being absorbed under pheromones intensity;In pheromones Intensity difference is in the case of larger so that Action Selection probability is partial to optimal action under pheromones intensity, so as to help to calculate Method restrains.Additionally, Boltzmann mechanism causes that other actions also have certain probability to be chosen, so as to promote algorithm to be explored.
Simulated environment is set as background using substation:As shown in figure 4, solid red area is represented based on equipment Barrier, surrounding represents wall obstacle.Start position is set to (1,1), and target location is set to (18,17);Target location Return value is 100, remaining position return value according to the position and the range difference of target location size distribution [0,2] model In enclosing, the smaller then return value of range difference is bigger;To encourage Agent to find target location with minimum step number, Agent often performs one Action, can obtain the return value of -1;Agent motion spaces be { 1,2,3,4 }, respectively represent upwards, downwards, to the left, to It is right;If Agent bumps against barrier or wall, back to square one, and obtain -10 punishment.
When emulation experiment is carried out using distinct methods, identical parameters are disposed as, as shown in table 1.To ensure as far as possible Experimental result is accurate, carries out 20 experiments respectively to every kind of method, and the episode of experiment is set to 3000 every time, takes this 20 times The data mean value of experiment is analyzed as experimental result.Wherein, the information strength influence magnitude parameter of PSG-HAQL is set to 1.5;HAQL is the heuristic Q study in document [8], and the η of H-HAQL, L-HAQL is respectively set to 1.5,0.1, to PSG- HAQL compares experiment.
The emulation experiment parameter setting of table 1
Experimental result and analysis
Using above-mentioned simulated environment and parameter setting, PSG-HAQL algorithms, H-HAQL algorithms, L-HAQL is respectively adopted and calculates Method, Standard-QL algorithms carry out emulation experiment.
Following 3 parameters are given herein describes experimental result:
Learning process adds up success rate:Reach the study plot number of target location and the ratio of study plot sum;
The study step number used per plot:The step number used by target location is found in plot study;If not reaching target Then step number is 0 in place;
Per plot, study obtains accumulative return value:The plot study from initial state reach final state (barrier or Person target location) the accumulative return value that is obtained.
It is have a understanding for totality to four kinds of algorithm performance worries, looks first at learning process and be accumulated into power curve, As shown in figure 5, transverse axis represents study plot number episode, the longitudinal axis represents success rate.By Fig. 5, the song of PSG-HAQL, H-HAQL Line is substantially better than the success rate curve of L-HAQL, Standard-QL, and heuristic function can accelerate reinforcing in having confirmed document [8] The pace of learning of learning algorithm.Additionally, the success rate curve early start of PSG-HAQL rises, and curve initial phase slope is most Greatly, illustrate to reach the frequency highest of target location at training initial stage PSG-HAQL;In total success rate, PSG-HAQL is also above Other three kinds of algorithms.
Success rate curve is to be counted generally directed to whether the study of every plot reaches target location, can not be direct Thus judge to learn four kinds of algorithm effects per plot.The study step number used per plot is counted for this, curve is as shown in fig. 6, transverse axis Study plot number is represented, the longitudinal axis represents every plot study step number used.Although testing the data statistics result of average at 20 times In, PSG-HAQL searches out target location at first;But find in an experiment, four kinds of algorithms are found for the first time in certain is once tested Sorted to step number size used by target location and not can determine that, i.e., four kinds algorithms are possible to find target location at first, and this is Because four kinds of algorithm initial exploration directions are random.In figure 6, PSG-HAQL algorithms are due to using heuristic function, and it is tactful Acted according to the selection of fitness situation, so step number integrally will be few than other three kinds;Although H-HAQL algorithms also have one compared with Big heuristic function, but be easier to be absorbed in part, so step number general status is not so good as PSG-HAQL;And L-HAQL is due to heuristic function Intensity is little, so similar with Standard-QL, although step number reaches at least once in a while, but has larger fluctuation.In total result, The minimum Action Selection strategy of step number that PSG-HAQL most can be stablized soon.
Agent can reach target location by different paths, and step number needed for different paths is mostly different;But it is also possible to not Step number with path is identical.By this setting, per plot, study is obtained accumulative return value-result parameter, as shown in fig. 7, transverse axis table Dendrography practises plot number, and the longitudinal axis represents that every plot study obtains accumulative return value-result parameter.In the figure 7, institute is learnt per plot Obtain accumulative return value overall condition and study step number curve used is similar per plot.PSG-HAQL is probably in plot number Stabilization, H-HAQL stabilizations probably when plot number is 1100 are reached when 400, and L-HAQL and Standard-QL then still fluctuate It is larger, and it is not optimal action.Result shows that PSG-HAQL can faster obtain accumulative return value action plan higher Slightly, the strategy of the equivalent return degree that other algorithms cannot also be stablized within the time, so as to show that PSG-HAQL can have Effect improves the convergence rate of Action Selection strategy.
The thought of bee colony information transmission is attached to heuristic Q learning methods by PSG-HAQL algorithms:Agent is in training process In constantly obtain the fitness of Different Strategies with the online updating policy information intensity, inspire letter using information strength as Q study Number, makes Agent have more high probability to remove the strategy for selecting information strength high.So, the heuristic Q study of information strength guiding (PSG-HAQL) what algorithm can be more efficient searches out optimal policy, so as to further reduce the training time.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (4)

1. a kind of Intelligent Mobile Robot path planning system, it is characterised in that based on information strength guide and elicitation formula Q Practise, including middle control module, distance sensor module, RFID module and motion-control module, the distance sensor module is by 7 Range sensor is constituted, and the avoidance of crusing robot is used for for sending measured range data to middle control module;RFID Module is made up of the rfid interrogator in the RFID label tag and crusing robot of fixed point distribution, for by RFID landmark datas and mesh Mark place position data send middle control module to is used for location position and the target location determination of crusing robot;Motion control mould Block receives the order from middle control module and determines the direction of motion;Middle control module is the Agent of crusing robot, for receiving other Module spreads out of the data come and determines action strategy, and to motion-control module transmission order with path planning.
2. a kind of Intelligent Mobile Robot path planning system as claimed in claim 1, it is characterised in that with inspection machine People front is zero degree line, and seven range sensors are assemblied in -90 °, -60 °, -30 °, 0 °, 30 °, 60 °, 90 ° successively patrols and examines Robot side.
3. a kind of Intelligent Mobile Robot path planning system as claimed in claim 1, it is characterised in that by following step The rapid foundation for completing crusing robot rewards and punishments mechanism:
Step 1:The mobile rewards and punishments mechanism of setting:To encourage robot to move to impact point with step number as few as possible, perform every time One action can all produce a punishment return value;Simultaneously to encourage robot to judge in advance, in the case of inessential to the greatest extent Amount avoids wide-angle from moving, and the punishment return value of wide-angle movement is some larger.It is specifically configured to:Belong in action -30 °, 0 °, 30 ° } when, punishment return value is -0.2;When action belongs to { -60 °, 60 ° }, punishment return value is -0.5;
Step 2:Sets target place rewards and punishments mechanism:The position of crusing robot and target device is demarcated using RFID;Patrolling and examining After each step action of robot, the distance between current location and objective d is calculated, by-d (even the distance value for being calculated Negate) as target return value now;Meanwhile, the return value that will be moved into objective is set to+100;
Step 3:Crusing robot avoidance return value is set:Using two-stage avoidance return value grade:When seven range sensors have When any one measurement result is less than 0.1 meter, assert that robot has bumped against barrier, now punish that return value is -100, and Current episode into the study of next episode is exited using this as final state;When seven range sensors have any When one measurement result is more than 0.1 and less than half robot car height, to encourage robot avoidance early, set now Punishment return value is -2.
4. a kind of Intelligent Mobile Robot path planning system as claimed in claim 1, it is characterised in that the middle control mould Block is based on the planning that following steps complete crusing robot path:
Step 1:Initialization Agent
Init state-action value function, heuristic function;Determine target device position and patrol and examine position;
Step 2:Design table H record information intensity
Table H is defined as four-tuple<si,ai,p(si,ai),fmax>;Wherein, siTo need the information state of fresh information intensity; aiTo need the information action of fresh information intensity;p(si,ai) be update after information strength, information strength is to be in fitness The scalar of direct ratio;fmaxIt is the information state s for recording before thisiFitness maximum;
Step 3:More new state-action value function
The renewal rule of Q learning states-action value function is as follows:
Q ( s t , a t ) = Q ( s t , a t ) + &alpha; &lsqb; R + &gamma; m a x a Q ( s t + 1 , a ) - Q ( s t , a t ) &rsqb;
Step 4:Update fitness maximum
By fitness value be defined as every act (episode) training in Agent from original state move to dbjective state discount tire out Meter return;Its definition mode isWherein, β is fitness discount factor, and R is by returning that each movements of Agent are obtained Report;When the maximum adaptation during Agent completes one act of obtained fitness value of training more than table H is spent, then fitness is carried out most The renewal of big value;
Step 5:Fresh information intensity
If fitness maximum updates, correspondingly fresh information intensity, information strength p (si,ai) renewal rule it is as follows:
p ( s i , a i ) = p ( s i , a i ) f m a x f , i f a i &NotEqual; a t 1 , i f a i = a t
Wherein, atIn state s in the study of the expression newest plots of AgentiThe action of use, aiThe information action in table H is represented, fmaxRepresent the fitness maximum in table H;
Step 6:It is determined that the heuristic function based on information strength
To make obtained information strength size be reflected directly in Action Selection, information strength is dissolved into heuristic function;It is logical Cross setting influence magnitude parameter and carry out influence degree of the control information intensity to Action Selection;Heuristic function update mode is defined such as Under:
H t ( s t , a ) = m a x a Q ( s t , a ) - Q ( s t , a t ) + p ( s t , a t ) s u m ( p ( s t , a ) ) U , i f a t = &pi; p ( s t ) 0 , o t h e r w i s e
Wherein, πp(st) it is the optimal action under information strength inspiration;U is by maximum information intensity and letter The importance of the action for ceasing intensity summation proportion to represent, is designated as h;U is influence magnitude ginseng of the information strength to Action Selection Number, the influence of the more big then information strengths of U is bigger;
Updated more than in rule, only the heuristic function of optimal action is updated under information strength inspiration, acts on action The selection of strategy, the heuristic function of optimal action is all set as 0 under non-information element intensity inspiration;Under pheromones intensity is inspired most The value function of excellent action be less than another action when, by be superimposed heuristic function make Action Selection be more prone to pheromones intensity compared with Big action, rather than the action that the selection value function in the case of not exclusively exploring is larger;
Step 7:Determine strategy under heuristic function and value function effect
The Action Selection strategy of the heuristic Q study of information strength guiding uses Boltzmann mechanism, and its update mode rule is such as Under:
P ( a i | s ) = e &lsqb; Q ( s , a i ) + H ( s , a i ) &rsqb; / T &Sigma; k = 1 N e &lsqb; Q ( s , a k ) + H ( s , a k ) &rsqb; / T
When using Boltzmann mechanism, if the action under current maximum actuation value function is not optimal dynamic under pheromones intensity Make, then by Q (st,a)+H(st, a), increase the select probability of optimal action under pheromones intensity;Use Boltzmann simultaneously Mechanism, different action messages element intensity difference away from less in the case of so that action and pheromones under maximum actuation value function The probability of optimal action is close under intensity, so as to the local optimum for avoiding being absorbed under pheromones intensity;Pheromones intensity difference away from In the case of larger so that Action Selection probability is partial to optimal action under pheromones intensity, so as to contribute to algorithmic statement.
CN201710153238.1A 2017-03-15 Transformer substation inspection robot path planning system Expired - Fee Related CN106843225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710153238.1A CN106843225B (en) 2017-03-15 Transformer substation inspection robot path planning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710153238.1A CN106843225B (en) 2017-03-15 Transformer substation inspection robot path planning system

Publications (2)

Publication Number Publication Date
CN106843225A true CN106843225A (en) 2017-06-13
CN106843225B CN106843225B (en) 2020-03-27

Family

ID=

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108406767A (en) * 2018-02-13 2018-08-17 华南理工大学 Robot autonomous learning method towards man-machine collaboration
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN108919796A (en) * 2018-06-04 2018-11-30 浙江立石机器人技术有限公司 Crusing robot and cruising inspection system
CN110302539A (en) * 2019-08-05 2019-10-08 苏州大学 A kind of tactics of the game calculation method, device, system and readable storage medium storing program for executing
CN110672101A (en) * 2019-09-20 2020-01-10 北京百度网讯科技有限公司 Navigation model training method and device, electronic equipment and storage medium
CN110752668A (en) * 2019-10-25 2020-02-04 国网陕西省电力公司电力科学研究院 Inspection system and inspection method for closed cabinet of transformer substation
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN111696370A (en) * 2020-06-16 2020-09-22 西安电子科技大学 Traffic light control method based on heuristic deep Q network
CN111857142A (en) * 2020-07-17 2020-10-30 广州大学 Path planning obstacle avoidance auxiliary method based on reinforcement learning
CN112558601A (en) * 2020-11-09 2021-03-26 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN112612273A (en) * 2020-12-21 2021-04-06 南方电网电力科技股份有限公司 Routing inspection robot obstacle avoidance path planning method, system, equipment and medium
CN112836777A (en) * 2021-03-02 2021-05-25 同济大学 Application method of consensus initiative mechanism in group robot target search
CN112833885A (en) * 2021-01-22 2021-05-25 牧原食品股份有限公司 Track navigation method, device and medium
CN113064419A (en) * 2019-12-30 2021-07-02 南京德朔实业有限公司 Intelligent mowing system and channel identification method thereof
CN113298386A (en) * 2021-05-27 2021-08-24 广西大学 Distributed multi-target depth deterministic value network robot energy management method
CN113515119A (en) * 2021-04-25 2021-10-19 华北电力大学 Routing planning scheme of inspection robot in transformer substation based on reinforcement learning
CN113858231A (en) * 2021-10-28 2021-12-31 武汉希文科技股份有限公司 Control method of transformer substation track robot system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608843A (en) * 1994-08-01 1997-03-04 The United States Of America As Represented By The Secretary Of The Air Force Learning controller with advantage updating algorithm
CN102082466A (en) * 2010-10-15 2011-06-01 重庆市电力公司超高压局 Intelligent inspection robot system for transformer substation equipment
CN102280826A (en) * 2011-07-30 2011-12-14 山东鲁能智能技术有限公司 Intelligent robot inspection system and intelligent robot inspection method for transformer station
CN102420392A (en) * 2011-07-30 2012-04-18 山东鲁能智能技术有限公司 Transformer substation inspection robot global path planning method based on magnetic navigation
CN102819264B (en) * 2012-07-30 2015-01-21 山东大学 Path planning Q-learning initial method of mobile robot
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
CN105259899A (en) * 2015-12-01 2016-01-20 国网重庆市电力公司电力科学研究院 Control system for transformer substation patrol robot
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN205950750U (en) * 2016-08-18 2017-02-15 广西电网有限责任公司北海供电局 Transformer station inspection robot control system that navigates based on inertial navigation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608843A (en) * 1994-08-01 1997-03-04 The United States Of America As Represented By The Secretary Of The Air Force Learning controller with advantage updating algorithm
CN102082466A (en) * 2010-10-15 2011-06-01 重庆市电力公司超高压局 Intelligent inspection robot system for transformer substation equipment
CN102280826A (en) * 2011-07-30 2011-12-14 山东鲁能智能技术有限公司 Intelligent robot inspection system and intelligent robot inspection method for transformer station
CN102420392A (en) * 2011-07-30 2012-04-18 山东鲁能智能技术有限公司 Transformer substation inspection robot global path planning method based on magnetic navigation
CN102819264B (en) * 2012-07-30 2015-01-21 山东大学 Path planning Q-learning initial method of mobile robot
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
CN105259899A (en) * 2015-12-01 2016-01-20 国网重庆市电力公司电力科学研究院 Control system for transformer substation patrol robot
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
CN205950750U (en) * 2016-08-18 2017-02-15 广西电网有限责任公司北海供电局 Transformer station inspection robot control system that navigates based on inertial navigation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AMIT KONAR,ETC: "A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot", 《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS》 *
胡俊,等: "未知环境下基于有先验知识的滚动Q学习机器人路径规划", 《控制与决策》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108406767A (en) * 2018-02-13 2018-08-17 华南理工大学 Robot autonomous learning method towards man-machine collaboration
CN108919796A (en) * 2018-06-04 2018-11-30 浙江立石机器人技术有限公司 Crusing robot and cruising inspection system
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN110302539A (en) * 2019-08-05 2019-10-08 苏州大学 A kind of tactics of the game calculation method, device, system and readable storage medium storing program for executing
CN110672101B (en) * 2019-09-20 2021-09-28 北京百度网讯科技有限公司 Navigation model training method and device, electronic equipment and storage medium
CN110672101A (en) * 2019-09-20 2020-01-10 北京百度网讯科技有限公司 Navigation model training method and device, electronic equipment and storage medium
CN110752668A (en) * 2019-10-25 2020-02-04 国网陕西省电力公司电力科学研究院 Inspection system and inspection method for closed cabinet of transformer substation
CN113064419A (en) * 2019-12-30 2021-07-02 南京德朔实业有限公司 Intelligent mowing system and channel identification method thereof
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN111638646B (en) * 2020-05-29 2024-05-28 平安科技(深圳)有限公司 Training method and device for walking controller of quadruped robot, terminal and storage medium
CN111696370A (en) * 2020-06-16 2020-09-22 西安电子科技大学 Traffic light control method based on heuristic deep Q network
CN111857142A (en) * 2020-07-17 2020-10-30 广州大学 Path planning obstacle avoidance auxiliary method based on reinforcement learning
CN111857142B (en) * 2020-07-17 2022-08-02 广州大学 Path planning obstacle avoidance auxiliary method based on reinforcement learning
CN112558601A (en) * 2020-11-09 2021-03-26 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN112558601B (en) * 2020-11-09 2024-04-02 广东电网有限责任公司广州供电局 Robot real-time scheduling method and system based on Q-learning algorithm and water drop algorithm
CN112612273A (en) * 2020-12-21 2021-04-06 南方电网电力科技股份有限公司 Routing inspection robot obstacle avoidance path planning method, system, equipment and medium
CN112833885A (en) * 2021-01-22 2021-05-25 牧原食品股份有限公司 Track navigation method, device and medium
CN112836777A (en) * 2021-03-02 2021-05-25 同济大学 Application method of consensus initiative mechanism in group robot target search
CN113515119A (en) * 2021-04-25 2021-10-19 华北电力大学 Routing planning scheme of inspection robot in transformer substation based on reinforcement learning
CN113298386A (en) * 2021-05-27 2021-08-24 广西大学 Distributed multi-target depth deterministic value network robot energy management method
CN113298386B (en) * 2021-05-27 2023-08-29 广西大学 Distributed multi-target depth deterministic value network robot energy management method
CN113858231A (en) * 2021-10-28 2021-12-31 武汉希文科技股份有限公司 Control method of transformer substation track robot system

Similar Documents

Publication Publication Date Title
CN102496069B (en) Cable multimode safe operation evaluation method based on fuzzy analytic hierarchy process (FAHP)
CN106611090B (en) A kind of road side air pollutant concentration Forecasting Methodology based on reconstruct deep learning
CN105425820B (en) A kind of multiple no-manned plane collaboratively searching method for the moving target with perception
CN107563412A (en) A kind of infrared image power equipment real-time detection method based on deep learning
CN107861508A (en) A kind of mobile robot local motion method and device for planning
CN103488869A (en) Wind power generation short-term load forecast method of least squares support vector machine
CN106231609A (en) A kind of underwater sensor network Optimization deployment method based on highest priority region
CN103823504B (en) A kind of maximum power tracking and controlling method based on least square method supporting vector machine
CN106228232A (en) A kind of dynamic multi-objective based on fuzzy reasoning Population forecast strategy teaching optimization method
CN106707153A (en) FOA-RBF based high-voltage circuit breaker fault diagnosis method
Ning et al. GA-BP air quality evaluation method based on fuzzy theory.
CN103400315A (en) Evaluation method of smart power grid integration demonstration project
CN110428413A (en) A kind of Spodopterafrugiperda adult image detecting method lured for lamp under equipment
CN114498634B (en) Electric automobile charging load prediction method based on ammeter data
Yang et al. Intelligent cooperation control of urban traffic networks
US20230166397A1 (en) Method for obstacle avoidance in degraded environments of robots based on intrinsic plasticity of snn
Linlin et al. Research on the back propagation neural network haze prediction model based on particle swarm optimization
CN104020769B (en) Robot overall path planning method based on charge system search
CN106843225A (en) A kind of Intelligent Mobile Robot path planning system
CN107562837A (en) A kind of maneuvering Target Tracking Algorithm based on road network
CN102708519B (en) A kind of Substation Optimization Location method based on free searching algorithm
CN110363232A (en) Millimeter wave detector New Jamming Effects Evaluation Method based on BP neural network
Tu et al. Evaluation of seawater quality in hangzhou bay based on TS fuzzy neural network
CN106843225B (en) Transformer substation inspection robot path planning system
CN105046355A (en) System energy consumption modeling method based on improved mind evolutionary algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200327

Termination date: 20210315