CN114326734B

CN114326734B - Path planning method and device

Info

Publication number: CN114326734B
Application number: CN202111635189.8A
Authority: CN
Inventors: 薛均晓; 董博威; 万里红; 冷洁; 张世文
Original assignee: Zhongyuan Power Intelligent Robot Co ltd
Current assignee: Zhongyuan Power Intelligent Robot Co ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2024-03-08
Anticipated expiration: 2041-12-29
Also published as: CN114326734A

Abstract

The invention discloses a path planning method and a path planning device, wherein the method comprises the following steps: firstly, obtaining a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively; and screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state, and inputting the second relative distance into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state. By adopting the embodiment of the invention, the accuracy rate of avoiding the obstacle in the dense scene can be improved.

Description

Path planning method and device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a path planning method and apparatus.

Background

The path planning problem is to realize the core component part of the autonomous movement of the intelligent equipment, so as to realize the aim of finding an optimal path from a starting point to an end point in a preset area range under the optimal targets of the intelligent equipment such as the minimum time or the shortest distance.

In dense scenarios, path planning becomes more difficult: a large number of obstacles not only bring about a larger observation space, but also require an intelligent agent to implement path planning and real-time obstacle avoidance more quickly. For a global path, a plurality of obstacles are often needed to be avoided in a dense scene, so that a great amount of time is needed to explore the environment to learn the obstacle avoidance behavior, the convergence time is slow, even the situation of non-convergence occurs, and the accuracy rate of path planning is further low.

In summary, the existing path planning method has the problem of low obstacle avoidance accuracy in dense scenes.

Disclosure of Invention

The embodiment of the invention provides a path planning method and a path planning device, which improve the accuracy of avoiding obstacles in dense scenes.

A first aspect of an embodiment of the present application provides a path planning method, including:

acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively;

and after screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state and inputting the local environment state into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.

In a possible implementation manner of the first aspect, the second relative distance is set to a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: decision networks and value networks;

setting the second relative distance as a local environment state and inputting the second relative distance into a decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;

after the intelligent equipment to be planned is controlled to move according to the decision behaviors, when the decision behaviors are judged to be obstacle avoidance behaviors, the decision behaviors are enhanced and learned through a value network, and a first rewarding value is given to the decision behaviors;

inputting the first rewards value into the value network so that the value network calculates a strategy evaluation value of the decision action;

and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches the target position, and completing the path planning of the intelligent equipment to be planned.

In a possible implementation manner of the first aspect, the decision making action is an obstacle avoidance action, specifically:

if the history movement behavior is executed continuously and the decision-making behavior is not executed, the history movement behavior is executed continuously and is not executed; otherwise, the decision behavior is not the obstacle avoidance behavior.

In a possible implementation manner of the first aspect, the method further includes:

acquiring a real-time moving direction of intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;

calculating a second prize value according to the movement angle and a preset constraint angle;

and when the moving angle is smaller than the preset angle, giving a second prize value to the moving angle.

In a possible implementation manner of the first aspect, the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, specifically:

the local perceptual conditions include: a first preset value;

and when the first relative distance is smaller than the first preset value, taking the first relative distance as a second relative distance and acquiring.

A second aspect of an embodiment of the present application provides a path planning apparatus, including: an acquisition module and a planning module;

the intelligent device comprises an acquisition module, a first obstacle acquisition module and a second obstacle acquisition module, wherein the acquisition module is used for acquiring a plurality of first relative distances between the intelligent device to be planned and a plurality of first obstacles respectively;

the planning module is used for screening a second relative distance from the plurality of first relative distances according to the local perception condition, setting the second relative distance as a local environment state and inputting the second relative distance into the neural network so as to enable the neural network to carry out path planning on the intelligent equipment to be planned according to the local environment state.

In a possible implementation manner of the second aspect, the second relative distance is set to a local environment state and is input into the neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: decision networks and value networks;

In a possible implementation manner of the second aspect, the decision making action is an obstacle avoidance action, specifically:

In a possible implementation manner of the second aspect, the method further includes:

In a possible implementation manner of the second aspect, the second relative distance is obtained according to the local sensing condition and the plurality of first relative distances, specifically:

the local perceptual conditions include: a first preset value;

Compared with the prior art, the path planning method and the path planning device provided by the embodiment of the invention comprise the following steps: firstly, obtaining a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively; and screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state, and inputting the second relative distance into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.

The beneficial effects are that: according to the embodiment of the invention, after the second relative distance is obtained by screening according to the local sensing condition, the second relative distance is set to be the local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, in a scene of a high-density obstacle, the time for learning obstacle avoidance behavior of the environment of the intelligent equipment to be planned can be reduced, the convergence efficiency and the instantaneity of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.

And secondly, the embodiment of the invention introduces a global guiding mode by adding an angle constraint mode, guides the intelligent equipment to be planned from the global environment, gives a certain punishment when the movement angle exceeds a preset constraint angle, gives a proper reward when the movement angle is smaller than the preset constraint angle, gradually learns to move at a fixed range angle, and can effectively avoid the intelligent equipment to be planned from being unable to advance in the local environment.

In addition, after the obstacle avoidance behavior is screened from the decision behaviors, the obstacle avoidance behavior is strengthened, and the corresponding rewarding value is given to the obstacle avoidance behavior, so that the intelligent equipment to be planned quickly memorizes and learns how to avoid the obstacle.

Finally, the embodiment of the invention calculates the rewarding value and takes the approaching target position as an optimization target, so that the situation that the intelligent equipment to be planned can reach the target position quickly and the moving efficiency is improved by directly starting from the target position except the obstacle avoidance behavior in each step of decision making behavior so that the finally planned route is smooth and shorter.

Drawings

Fig. 1 is a flow chart of a path planning method according to an embodiment of the present invention;

FIG. 2 is a schematic view of a movement angle according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flow chart of a path planning method according to an embodiment of the present invention includes: S101-S102:

s101: and acquiring a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles respectively.

Preferably, the first obstacle is an obstacle in a dense scene.

S102: and after screening the plurality of first relative distances according to the local sensing conditions to obtain a second relative distance, setting the second relative distance as a local environment state and inputting the local environment state into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state.

Specifically, in a scenario with high-density obstacles, if the first relative distance is taken as the environmental state, a larger state space is generated, but for an intelligent agent (i.e. an intelligent device to be planned), the influence of the obstacles with different distances is different: an obstacle farther from the intelligent device to be planned is unlikely to collide with the intelligent body in the next step. And screening the first relative distances according to the local perception condition to obtain a second relative distance, wherein the second relative distance is the relative distance between the closer obstacle and the intelligent device to be planned. The second relative distance is set to be a local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, the time for learning obstacle avoidance behaviors of the environment of the intelligent equipment to be planned can be reduced in a scene of a high-density obstacle, the convergence efficiency and the real-time performance of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.

In this embodiment, the setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state, specifically:

the neural network includes: decision networks and value networks;

setting the second relative distance as a local environment state and inputting the second relative distance into the decision network so that the decision network calculates and obtains decision behaviors according to the local environment state;

after the intelligent equipment to be planned is controlled to move according to the decision-making behavior, when the decision-making behavior is judged to be obstacle avoidance behavior, the decision-making behavior is strengthened and learned through the value network, and a first rewarding value is given to the decision-making behavior;

and updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned.

Specifically, the decision-making behavior includes a preset advance distance of the intelligent device to be planned, which may be represented by the following coordinates:

a(STEP*a[0]，STEP*a[1])；

wherein a represents a decision action; STEP is a preset fixed STEP length for zooming the action space; the preset advancing distance of the intelligent equipment to be planned comprises the following steps: a movement distance STEP a [0] in the X direction and a movement distance STEP a [1] in the Y direction.

In a specific embodiment, the second relative distance is set to a local environmental state and is input into the decision network, in particular:

setting the normalized second relative distance as a local environment state and inputting the local environment state into the decision network, wherein the normalized second relative distance is as follows:

wherein the second relative distance comprises a second relative distance in the x-directionAnd a second relative distance in the y-direction +.>[X ₀ ，Y ₀ ]For the coordinate position of the intelligent device to be planned, [ X ] _i ，Y _i ]The coordinate position of the first obstacle is W, the width of the environment, and H, the height of the environment.

In a specific embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:

if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not obstacle avoidance behavior.

The method comprises the following steps: the last step of moving action is performed before each step of moving action is performed(i.e., historical movement behavior) is denoted as a _t-1 . In the current environmental state, will a _t-1 With the current decision behavior a _t A comparison is made. If the previous step of moving action is carried out in the current state and collision can occur, but the current moving action (namely decision action) is carried out, the current moving action is judged to be an obstacle avoidance action, and screening of the obstacle avoidance action is completed.

After screening and obtaining second relative distances from a plurality of first relative distances according to local perception conditions, setting the second relative distances as local environment states and inputting the local environment states into a neural network, and carrying out local observation on the environment, namely, the intelligent equipment to be planned only interacts with the environment in a small range around, but for the path planning problem, a global path is required, and the global path exploration is difficult to realize through the environment interaction in the local range. The global guidance mode needs to be introduced by adding angle constraints, so as to guide the behavior of the intelligent device to be planned from the global environment. The mode of the angle constraint is specifically as follows: acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position; calculating a second prize value according to the movement angle and a preset constraint angle; and when the moving angle is smaller than the preset angle, giving the second rewarding value to the moving angle.

Further, the intelligent equipment to be planned gradually searches for a feasible path through the exploration environment, limits the moving direction of the intelligent body in an angle constraint mode, gives a certain punishment when the moving angle exceeds a preset constraint angle, gives proper rewards when the moving angle is smaller than the preset constraint angle, and enables the intelligent equipment to be planned to gradually learn to move in a fixed range of angles.

Specifically, the calculation of the second prize value may be represented by the following equation:

R＝(15-θ)*γ；

wherein R is a second reward value, 15 is a preset constraint angle, θ is a movement angle, γ is a scale factor, and (15- θ) represents an included angle. When the included angle is smaller, the advancing direction is closer to the target direction, and the rewarding value is larger; the larger the angle, the smaller the prize value.

Further, when the included angle is larger than the set angle difference, punishment is given, and the punishment is higher when the included angle is larger.

In this embodiment, the obtaining the second relative distance according to the local sensing condition and the plurality of first relative distances specifically includes:

the local perceptual conditions include: a first preset value;

and when the first relative distance is judged to be smaller than the first preset value, the first relative distance is taken as the second relative distance and acquired.

In this embodiment, the path planning problem is modeled as a reinforcement learning problem, and global path planning is implemented in a sequential decision manner. The concrete steps are as follows: the intelligent equipment to be planned acquires an environmental state, makes a decision action (the decision action comprises a preset advancing distance and a preset advancing direction) through a decision network, controls the intelligent equipment to be planned to move according to the decision action, inputs the changed environmental state into the decision network again when the environmental state is changed, and repeats the decision process until the intelligent equipment to be planned reaches a target position.

In model-free reinforcement learning, the transition probability between states is not determined, and the learning process mainly consists of policy evaluation and policy improvement. 1. Policy evaluation: the current strategy is evaluated by adopting a mode of calculating a value function, comprising a state value function and a behavior value function, and adopting a random sample estimated value as an evaluation standard. The neural network is used for fitting the cost function, the specific value is directly output, and then the gap between the specific value and the actual value is reduced by updating network parameters. 2. Strategy improvement: after the policy evaluation value is obtained, the policy is updated according to the evaluation value, and the policy is gradually improved, so that higher value can be obtained, and the improvement process is specifically mapped to update of network parameters.

And (3) updating a network: the neural network used in the embodiment of the invention mainly comprises two parts: decision networks and value networks. The decision network is used for outputting decision behaviors, and the value network is used for evaluating the decision behaviors. Both the decision network and the value network update the network in a gradient descent manner.

Parameter gradients for decision networksThe following is shown:

wherein,for value gradients, the parameter is derived from the value network to maximize value as an update target. Further, s _i For the environmental state at the i-th moment, a _i For the action at the i-th time, θ ^Q For the network parameter at time i, m is the number of samples, and n is the number of samples extracted from the experience pool at a time. Since the gradient descent mode is adopted, the negative gradient is used as the update to realize the maximization of the value.

The parameter gradients of the value network are as follows:

wherein y is _i ＝r _i +γQ′(s _i+1 ，a _i+1 |θ ^Q′ ) Expressed as a value criterion at the current moment, r _i And the rewarding value is indicated as environmental feedback at the current moment. The network updates the target with a gap that minimizes the target value. Further, s _i+1 An environmental state at the (i+1) th time, a _i+1 For the operation at time i+1, θ ^Q’ For the network parameter at time i+1, m is the number of samples, and n is the number of samples extracted from the experience pool at a time.

In order to further describe the calculation process of the movement angle, please refer to fig. 2, fig. 2 is a schematic diagram of the movement angle according to an embodiment of the present invention.

Wherein a [ x, y ] represents decision-making behavior, (x 0, y 0) represents a departure position of the intelligent device to be planned, and (xi, yi) represents a target position of the intelligent device to be planned.

The calculation of the movement angle θ is expressed by the following formula:

θ＝arctan y/x-arctan(y _i -y ₀ )/(x _i -x ₀ )；

wherein X represents the distance that the intelligent device to be planned moves in the X direction, Y represents the distance that the intelligent device to be planned moves in the Y direction, and (xi-X0) and (yi-Y0) represent the real-time moving direction of the intelligent device to be planned.

For further explanation of the path planning apparatus, please refer to fig. 3, fig. 3 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present invention, which includes: an acquisition module 301 and a planning module 302;

the acquiring module 301 is configured to acquire a plurality of first relative distances between the smart device to be planned and a plurality of first obstacles, respectively.

The planning module 302 is configured to screen a second relative distance from the plurality of first relative distances according to a local sensing condition, set the second relative distance as a local environment state, and input the local environment state into a neural network, so that the neural network performs path planning on the intelligent device to be planned according to the local environment state.

the neural network includes: decision networks and value networks;

In this embodiment, the determining that the decision behavior is an obstacle avoidance behavior specifically includes:

if the historical movement behavior is carried out continuously and is bumped against the first obstacle, and the decision behavior is carried out without bumping against the first obstacle, judging the decision behavior as obstacle avoidance behavior; otherwise, judging that the decision behavior is not an obstacle avoidance behavior; wherein the historical movement behavior is a movement behavior preceding the decision behavior.

In this embodiment, further comprising:

acquiring a real-time moving direction of the intelligent equipment to be planned, and calculating a moving angle according to the real-time moving direction and a target position;

and when the moving angle is smaller than the preset angle, giving the second rewarding value to the moving angle.

the local perceptual conditions include: a first preset value;

According to the embodiment of the invention, a plurality of first relative distances between the intelligent equipment to be planned and a plurality of first obstacles are acquired through an acquisition module; and screening the second relative distances from the plurality of first relative distances according to the local sensing conditions by the planning module, setting the second relative distances as local environment states, and inputting the local environment states into the neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states.

According to the embodiment of the invention, after the second relative distance is obtained by screening according to the local sensing condition, the second relative distance is set to be the local environment state and is input into the neural network, so that not only is the key environment state reserved, but also the environment complexity is reduced, in a scene of a high-density obstacle, the time for learning obstacle avoidance behavior of the environment of the intelligent equipment to be planned can be reduced, the convergence efficiency and the instantaneity of the neural network are improved, and finally the obstacle avoidance accuracy of the intelligent equipment to be planned is improved.

The foregoing is a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention.

Claims

1. A method of path planning, comprising:

after screening a plurality of first relative distances according to local perception conditions to obtain second relative distances, setting the second relative distances as local environment states and inputting the local environment states into a neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states;

the method comprises the steps of setting the second relative distance as a local environment state and inputting the second relative distance into a neural network, so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment state, specifically comprising the following steps:

the neural network includes: decision networks and value networks;

updating the decision network and the value network according to the strategy evaluation value until the intelligent equipment to be planned reaches a target position, so as to complete path planning of the intelligent equipment to be planned;

further comprises:

when the movement angle is smaller than the preset constraint angle, giving the second rewarding value to the movement angle;

the second relative distance is obtained according to the local perception condition and the plurality of first relative distances, specifically:

the local perceptual conditions include: a first preset value;

2. The path planning method according to claim 1, wherein the determining that the decision behavior is an obstacle avoidance behavior specifically includes:

3. A path planning apparatus, comprising: an acquisition module and a planning module;

the planning module is used for screening a plurality of first relative distances according to local perception conditions to obtain second relative distances, setting the second relative distances as local environment states and inputting the second relative distances into a neural network so that the neural network performs path planning on the intelligent equipment to be planned according to the local environment states;

the neural network includes: decision networks and value networks;

further comprises:

the local perceptual conditions include: a first preset value;

4. A path planning apparatus according to claim 3, wherein the decision making action is an obstacle avoidance action, specifically: