CN114326734B - Path planning method and device - Google Patents

Path planning method and device Download PDF

Info

Publication number
CN114326734B
CN114326734B CN202111635189.8A CN202111635189A CN114326734B CN 114326734 B CN114326734 B CN 114326734B CN 202111635189 A CN202111635189 A CN 202111635189A CN 114326734 B CN114326734 B CN 114326734B
Authority
CN
China
Prior art keywords
decision
value
behavior
planned
relative distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111635189.8A
Other languages
Chinese (zh)
Other versions
CN114326734A (en
Inventor
薛均晓
董博威
万里红
冷洁
张世文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongyuan Power Intelligent Robot Co ltd
Original Assignee
Zhongyuan Power Intelligent Robot Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongyuan Power Intelligent Robot Co ltd filed Critical Zhongyuan Power Intelligent Robot Co ltd
Priority to CN202111635189.8A priority Critical patent/CN114326734B/en
Publication of CN114326734A publication Critical patent/CN114326734A/en
Application granted granted Critical
Publication of CN114326734B publication Critical patent/CN114326734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a path planning method and a path planning device, wherein the method comprises the following steps: firstly, obtaining a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively; and screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state, and inputting the second relative distance into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state. By adopting the embodiment of the invention, the accuracy rate of avoiding the obstacle in the dense scene can be improved.

Description

Path planning method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a path planning method and apparatus.
Background
The path planning problem is to realize the core component part of the autonomous movement of the intelligent equipment, so as to realize the aim of finding an optimal path from a starting point to an end point in a preset area range under the optimal targets of the intelligent equipment such as the minimum time or the shortest distance.
In dense scenarios, path planning becomes more difficult: a large number of obstacles not only bring about a larger observation space, but also require an intelligent agent to implement path planning and real-time obstacle avoidance more quickly. For a global path, a plurality of obstacles are often needed to be avoided in a dense scene, so that a great amount of time is needed to explore the environment to learn the obstacle avoidance behavior, the convergence time is slow, even the situation of non-convergence occurs, and the accuracy rate of path planning is further low.
In summary, the existing path planning method has the problem of low obstacle avoidance accuracy in dense scenes.
Disclosure of Invention
The embodiment of the invention provides a path planning method and a path planning device, which improve the accuracy of avoiding obstacles in dense scenes.
A first aspect of an embodiment of the present application provides a path planning method, including:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
and after screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state and inputting the local environment state into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.
In a possible implementation manner of the first aspect, the second relative distance is set to a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into a decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision behaviors, when the decision behaviors are judged to be obstacle avoidance behaviors, the decision behaviors are enhanced and learned through a value network, and a first rewarding value is given to the decision behaviors;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and completing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the first aspect, the decision making action is an obstacle avoidance action, specifically:
if the history movement behavior is executed continuously and the decision-making behavior is not executed, the history movement behavior is executed continuously and is not executed; otherwise, the decision behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the first aspect, the method further includes:
acquiring a real-time moving direction of intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating a second prize value according to the movement angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second prize value to the moving angle.
In a possible implementation manner of the first aspect, the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, specifically:
the local perceptual conditions include: a first preset value;
and when the first relative distance is smaller than the first preset value, taking the first relative distance as a second relative distance and acquiring.
A second aspect of an embodiment of the present application provides a path planning apparatus, including: an acquisition module and a planning module;
the intelligent device comprises an acquisition module, a first obstacle acquisition module and a second obstacle acquisition module, wherein the acquisition module is used for acquiring a plurality of first relative distances between the intelligent device to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the second relative distance into the neural network so as to enable the neural network to carry out path planning on the intelligent equipment to be planned according to the local environment state.
In a possible implementation manner of the second aspect, the second relative distance is set to a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into a decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision behaviors, when the decision behaviors are judged to be obstacle avoidance behaviors, the decision behaviors are enhanced and learned through a value network, and a first rewarding value is given to the decision behaviors;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and completing the path planning of the intelligent equipment to be planned.
In a possible implementation manner of the second aspect, the decision making action is an obstacle avoidance action, specifically:
if the history movement behavior is executed continuously and the decision-making behavior is not executed, the history movement behavior is executed continuously and is not executed; otherwise, the decision behavior is not the obstacle avoidance behavior.
In a possible implementation manner of the second aspect, the method further includes:
acquiring a real-time moving direction of intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating a second prize value according to the movement angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving a second prize value to the moving angle.
In a possible implementation manner of the second aspect, the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, specifically:
the local perceptual conditions include: a first preset value;
and when the first relative distance is smaller than the first preset value, taking the first relative distance as a second relative distance and acquiring.
Compared with the prior art, the path planning method and the path planning device provided by the embodiment of the invention comprise the following steps: firstly, obtaining a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively; and screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state, and inputting the second relative distance into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.
The beneficial effects are that: according to the embodiment of the invention, after the second relative distance is obtained by screening according to the local sensing condition, the second relative distance is set to be the local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, in a scene of a high-density obstacle, the time for learning obstacle avoidance behavior of the environment of the intelligent equipment to be planned can be reduced, the convergence efficiency and the instantaneity of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.
And secondly, the embodiment of the invention introduces a global guiding mode by adding an angle constraint mode, guides the intelligent equipment to be planned from the global environment, gives a certain punishment when the movement angle exceeds a preset constraint angle, gives a proper reward when the movement angle is smaller than the preset constraint angle, gradually learns to move at a fixed range angle, and can effectively avoid the intelligent equipment to be planned from being unable to advance in the local environment.
In addition, after the obstacle avoidance behavior is screened from the decision behaviors, the obstacle avoidance behavior is strengthened, and the corresponding rewarding value is given to the obstacle avoidance behavior, so that the intelligent equipment to be planned quickly memorizes and learns how to avoid the obstacle.
Finally, the embodiment of the invention calculates the rewarding value and takes the approaching target position as an optimization target, so that the situation that the intelligent equipment to be planned can reach the target position quickly and the moving efficiency is improved by directly starting from the target position except the obstacle avoidance behavior in each step of decision making behavior so that the finally planned route is smooth and shorter.
Drawings
Fig. 1 is a flow chart of a path planning method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a movement angle according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of a path planning method according to an embodiment of the present invention includes: S101-S102:
s101: and acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively.
Preferably, the first obstacle is an obstacle in a dense scene.
S102: and after screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state and inputting the local environment state into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.
Specifically, in a scenario with high-density obstacles, if the first relative distance is taken as the environmental state, a larger state space is generated, but for an intelligent agent (i.e. an intelligent device to be planned), the influence of the obstacles with different distances is different: an obstacle farther from the intelligent device to be planned is unlikely to collide with the intelligent body in the next step. And screening the first relative distances according to the local perception condition to obtain a second relative distance, wherein the second relative distance is the relative distance between the closer obstacle and the intelligent device to be planned. The second relative distance is set to be a local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, the time for learning obstacle avoidance behaviors of the environment of the intelligent equipment to be planned can be reduced in a scene of a high-density obstacle, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.
In this embodiment, the setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be obstacle avoidance behavior, the decision-making behavior is strengthened and learned through the value network, and a first rewarding value is given to the decision-making behavior;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned.
Specifically, the decision-making behavior includes a preset advance distance of the intelligent device to be planned, which may be represented by the following coordinates:
a(STEP*a[0],STEP*a[1]);
wherein a represents a decision action; STEP is a preset fixed STEP length for zooming the action space; the preset advancing distance of the intelligent equipment to be planned comprises the following steps: a movement distance STEP a [0] in the X direction and a movement distance STEP a [1] in the Y direction.
In a specific embodiment, the second relative distance is set to a local environmental state and is input into the decision network, in particular:
setting the normalized second relative distance as a local environment state and inputting the local environment state into the decision network, wherein the normalized second relative distance is as follows:
wherein the second relative distance comprises a second relative distance in the x-directionAnd a second relative distance in the y-direction +.>[X 0 ,Y 0 ]For the coordinate position of the intelligent device to be planned, [ X ] i ,Y i ]The coordinate position of the first obstacle is W, the width of the environment, and H, the height of the environment.
In a specific embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not obstacle avoidance behavior.
The method comprises the following steps: the last step of moving action is performed before each step of moving action is performed(i.e., historical movement behavior) is denoted as a t-1 . In the current environmental state, will a t-1 With the current decision behavior a t A comparison is made. If the previous step of moving action is carried out in the current state and collision can occur, but the current moving action (namely decision action) is carried out, the current moving action is judged to be an obstacle avoidance action, and screening of the obstacle avoidance action is completed.
After screening and obtaining second relative distances from a plurality of first relative distances according to local perception conditions, setting the second relative distances as local environment states and inputting the local environment states into a neural network, and carrying out local observation on the environment, namely, the intelligent equipment to be planned only interacts with the environment in a small range around, but for the path planning problem, a global path is required, and the global path exploration is difficult to realize through the environment interaction in the local range. The global guidance mode needs to be introduced by adding angle constraints, so as to guide the behavior of the intelligent device to be planned from the global environment. The mode of the angle constraint is specifically as follows: acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position; calculating a second prize value according to the movement angle and a preset constraint angle; and when the moving angle is smaller than the preset angle, giving the second rewarding value to the moving angle.
Further, the intelligent equipment to be planned gradually searches for a feasible path through the exploration environment, limits the moving direction of the intelligent body in an angle constraint mode, gives a certain punishment when the moving angle exceeds a preset constraint angle, gives proper rewards when the moving angle is smaller than the preset constraint angle, and enables the intelligent equipment to be planned to gradually learn to move in a fixed range of angles.
Specifically, the calculation of the second prize value may be represented by the following equation:
R=(15-θ)*γ;
wherein R is a second reward value, 15 is a preset constraint angle, θ is a movement angle, γ is a scale factor, and (15- θ) represents an included angle. When the included angle is smaller, the advancing direction is closer to the target direction, and the rewarding value is larger; the larger the angle, the smaller the prize value.
Further, when the included angle is larger than the set angle difference, punishment is given, and the punishment is higher when the included angle is larger.
In this embodiment, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perceptual conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset value, the first relative distance is taken as the second relative distance and acquired.
In this embodiment, the path planning problem is modeled as a reinforcement learning problem, and global path planning is implemented in a sequential decision manner. The concrete steps are as follows: the intelligent equipment to be planned acquires an environmental state, makes a decision action (the decision action comprises a preset advancing distance and a preset advancing direction) through a decision network, controls the intelligent equipment to be planned to move according to the decision action, inputs the changed environmental state into the decision network again when the environmental state is changed, and repeats the decision process until the intelligent equipment to be planned reaches a target position.
In model-free reinforcement learning, the transition probability between states is not determined, and the learning process mainly consists of policy evaluation and policy improvement. 1. Policy evaluation: the current strategy is evaluated by adopting a mode of calculating a value function, comprising a state value function and a behavior value function, and adopting a random sample estimated value as an evaluation standard. The neural network is used for fitting the cost function, the specific value is directly output, and then the gap between the specific value and the actual value is reduced by updating network parameters. 2. Strategy improvement: after the policy evaluation value is obtained, the policy is updated according to the evaluation value, and the policy is gradually improved, so that higher value can be obtained, and the improvement process is specifically mapped to update of network parameters.
And (3) updating a network: the neural network used in the embodiment of the invention mainly comprises two parts: decision networks and value networks. The decision network is used for outputting decision behaviors, and the value network is used for evaluating the decision behaviors. Both the decision network and the value network update the network in a gradient descent manner.
Parameter gradients for decision networksThe following is shown:
wherein,for value gradients, the parameter is derived from the value network to maximize value as an update target. Further, s i For the environmental state at the i-th moment, a i For the action at the i-th time, θ Q For the network parameter at time i, m is the number of samples, and n is the number of samples extracted from the experience pool at a time. Since the gradient descent mode is adopted, the negative gradient is used as the update to realize the maximization of the value.
The parameter gradients of the value network are as follows:
wherein y is i =r i +γQ′(s i+1 ,a i+1Q′ ) Expressed as a value criterion at the current moment, r i And the rewarding value is indicated as environmental feedback at the current moment. The network updates the target with a gap that minimizes the target value. Further, s i+1 An environmental state at the (i+1) th time, a i+1 For the operation at time i+1, θ Q’ For the network parameter at time i+1, m is the number of samples, and n is the number of samples extracted from the experience pool at a time.
In order to further describe the calculation process of the movement angle, please refer to fig. 2, fig. 2 is a schematic diagram of the movement angle according to an embodiment of the present invention.
Wherein a [ x, y ] represents decision-making behavior, (x 0, y 0) represents a departure position of the intelligent device to be planned, and (xi, yi) represents a target position of the intelligent device to be planned.
The calculation of the movement angle θ is expressed by the following formula:
θ=arctan y/x-arctan(y i -y 0 )/(x i -x 0 );
wherein X represents the distance that the intelligent device to be planned moves in the X direction, Y represents the distance that the intelligent device to be planned moves in the Y direction, and (xi-X0) and (yi-Y0) represent the real-time moving direction of the intelligent device to be planned.
For further explanation of the path planning apparatus, please refer to fig. 3, fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention, which includes: an acquisition module 301 and a planning module 302;
the acquiring module 301 is configured to acquire a plurality of first relative distances between the smart device to be planned and a plurality of first obstacles, respectively.
The planning module 302 is configured to screen a second relative distance from the plurality of first relative distances according to a local sensing condition, set the second relative distance as a local environment state, and input the local environment state into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state.
In this embodiment, the setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be obstacle avoidance behavior, the decision-making behavior is strengthened and learned through the value network, and a first rewarding value is given to the decision-making behavior;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned.
In this embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not an obstacle avoidance behavior; wherein the historical movement behavior is a movement behavior preceding the decision behavior.
In this embodiment, further comprising:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating a second prize value according to the movement angle and a preset constraint angle;
and when the moving angle is smaller than the preset angle, giving the second rewarding value to the moving angle.
In this embodiment, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:
the local perceptual conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset value, the first relative distance is taken as the second relative distance and acquired.
According to the embodiment of the invention, a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles are acquired through an acquisition module; and screening the second relative distances from the plurality of first relative distances according to the local sensing conditions by the planning module, setting the second relative distances as local environment states, and inputting the local environment states into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states.
According to the embodiment of the invention, after the second relative distance is obtained by screening according to the local sensing condition, the second relative distance is set to be the local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, in a scene of a high-density obstacle, the time for learning obstacle avoidance behavior of the environment of the intelligent equipment to be planned can be reduced, the convergence efficiency and the instantaneity of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.
And secondly, the embodiment of the invention introduces a global guiding mode by adding an angle constraint mode, guides the intelligent equipment to be planned from the global environment, gives a certain punishment when the movement angle exceeds a preset constraint angle, gives a proper reward when the movement angle is smaller than the preset constraint angle, gradually learns to move at a fixed range angle, and can effectively avoid the intelligent equipment to be planned from being unable to advance in the local environment.
In addition, after the obstacle avoidance behavior is screened from the decision behaviors, the obstacle avoidance behavior is strengthened, and the corresponding rewarding value is given to the obstacle avoidance behavior, so that the intelligent equipment to be planned quickly memorizes and learns how to avoid the obstacle.
Finally, the embodiment of the invention calculates the rewarding value and takes the approaching target position as an optimization target, so that the situation that the intelligent equipment to be planned can reach the target position quickly and the moving efficiency is improved by directly starting from the target position except the obstacle avoidance behavior in each step of decision making behavior so that the finally planned route is smooth and shorter.
The foregoing is a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention.

Claims (4)

1. A method of path planning, comprising:
acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;
after screening a plurality of first relative distances according to local perception conditions to obtain second relative distances, setting the second relative distances as local environment states and inputting the local environment states into a neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states;
the method comprises the steps of setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state, specifically comprising the following steps:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be obstacle avoidance behavior, the decision-making behavior is strengthened and learned through the value network, and a first rewarding value is given to the decision-making behavior;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned;
further comprises:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating a second prize value according to the movement angle and a preset constraint angle;
when the movement angle is smaller than the preset constraint angle, giving the second rewarding value to the movement angle;
the second relative distance is obtained according to the local perception condition and the plurality of first relative distances, specifically:
the local perceptual conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset value, the first relative distance is taken as the second relative distance and acquired.
2. The path planning method according to claim 1, wherein the determining that the decision behavior is an obstacle avoidance behavior specifically includes:
if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not obstacle avoidance behavior.
3. A path planning apparatus, comprising: an acquisition module and a planning module;
the intelligent device comprises an acquisition module, a first obstacle acquisition module and a second obstacle acquisition module, wherein the acquisition module is used for acquiring a plurality of first relative distances between the intelligent device to be planned and a plurality of first obstacles respectively;
the planning module is used for screening a plurality of first relative distances according to local perception conditions to obtain second relative distances, setting the second relative distances as local environment states and inputting the second relative distances into a neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states;
the method comprises the steps of setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state, specifically comprising the following steps:
the neural network includes: decision networks and value networks;
setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;
after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be obstacle avoidance behavior, the decision-making behavior is strengthened and learned through the value network, and a first rewarding value is given to the decision-making behavior;
inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;
updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned;
further comprises:
acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;
calculating a second prize value according to the movement angle and a preset constraint angle;
when the movement angle is smaller than the preset constraint angle, giving the second rewarding value to the movement angle;
the second relative distance is obtained according to the local perception condition and the plurality of first relative distances, specifically:
the local perceptual conditions include: a first preset value;
and when the first relative distance is judged to be smaller than the first preset value, the first relative distance is taken as the second relative distance and acquired.
4. A path planning apparatus according to claim 3, wherein the decision making action is an obstacle avoidance action, specifically:
if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not obstacle avoidance behavior.
CN202111635189.8A 2021-12-29 2021-12-29 Path planning method and device Active CN114326734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111635189.8A CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111635189.8A CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Publications (2)

Publication Number Publication Date
CN114326734A CN114326734A (en) 2022-04-12
CN114326734B true CN114326734B (en) 2024-03-08

Family

ID=81016080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111635189.8A Active CN114326734B (en) 2021-12-29 2021-12-29 Path planning method and device

Country Status (1)

Country Link
CN (1) CN114326734B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168324A (en) * 2017-06-08 2017-09-15 中国矿业大学 A kind of robot path planning method based on ANFIS fuzzy neural networks
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN112977146A (en) * 2021-02-24 2021-06-18 中原动力智能机器人有限公司 Charging method and system for automatic driving vehicle and charging pile
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168324A (en) * 2017-06-08 2017-09-15 中国矿业大学 A kind of robot path planning method based on ANFIS fuzzy neural networks
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN110083165A (en) * 2019-05-21 2019-08-02 大连大学 A kind of robot paths planning method under complicated narrow environment
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111399541A (en) * 2020-03-30 2020-07-10 西北工业大学 Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN112977146A (en) * 2021-02-24 2021-06-18 中原动力智能机器人有限公司 Charging method and system for automatic driving vehicle and charging pile
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的舰载机动态避障方法;薛均晓,等;计算机辅助设计与图形学学报;1102-1112 *
密集障碍物场景下基于改进DDPG 的机器人路径规划方法;薛均晓,等;2022中国自动化大会论文集;1-6 *
移动机器人的遗传多点路径规划;梁宏伟,等;工程技术;584-587 *

Also Published As

Publication number Publication date
CN114326734A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN112937564B (en) Lane change decision model generation method and unmanned vehicle lane change decision method and device
US10324469B2 (en) System and method for controlling motion of vehicle in shared environment
WO2020243162A1 (en) Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout
CN111487864B (en) Robot path navigation method and system based on deep reinforcement learning
JP2022506404A (en) Methods and devices for determining vehicle speed
WO2021208771A1 (en) Reinforced learning method and device
US11604469B2 (en) Route determining device, robot, and route determining method
CN112180950B (en) Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning
Rhinehart et al. Contingencies from observations: Tractable contingency planning with learned behavior models
CN113359853B (en) Route planning method and system for unmanned aerial vehicle formation cooperative target monitoring
CN113359859B (en) Combined navigation obstacle avoidance method, system, terminal equipment and storage medium
CN113189989B (en) Vehicle intention prediction method, device, equipment and storage medium
CN117093009B (en) Logistics AGV trolley navigation control method and system based on machine vision
CN115204044A (en) Method, apparatus and medium for generating trajectory prediction model and processing trajectory information
CN110288708A (en) A kind of map constructing method, device and terminal device
CN108121347B (en) Method and device for controlling movement of equipment and electronic equipment
Lee et al. Spatiotemporal costmap inference for MPC via deep inverse reinforcement learning
CN116300909A (en) Robot obstacle avoidance navigation method based on information preprocessing and reinforcement learning
CN116679711A (en) Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning
CN114326734B (en) Path planning method and device
Jin et al. Safe-Nav: learning to prevent PointGoal navigation failure in unknown environments
WO2023242223A1 (en) Motion prediction for mobile agents
Wang et al. Tracking moving target for 6 degree-of-freedom robot manipulator with adaptive visual servoing based on deep reinforcement learning PID controller
Ward et al. Towards risk minimizing trajectory planning in on-road scenarios
CN113158539A (en) Method for long-term trajectory prediction of traffic participants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant