CN115493595A - AUV path planning method based on local perception and near-end optimization strategy - Google Patents
AUV path planning method based on local perception and near-end optimization strategy Download PDFInfo
- Publication number
- CN115493595A CN115493595A CN202211219574.9A CN202211219574A CN115493595A CN 115493595 A CN115493595 A CN 115493595A CN 202211219574 A CN202211219574 A CN 202211219574A CN 115493595 A CN115493595 A CN 115493595A
- Authority
- CN
- China
- Prior art keywords
- auv
- network
- path planning
- optimization strategy
- ocean
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000005457 optimization Methods 0.000 title claims abstract description 16
- 230000008447 perception Effects 0.000 title claims abstract description 10
- 230000006870 function Effects 0.000 claims abstract description 28
- 230000009471 action Effects 0.000 claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims 1
- 206010063385 Intellectualisation Diseases 0.000 abstract description 2
- 230000002787 reinforcement Effects 0.000 description 5
- 230000004888 barrier function Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/203—Specially adapted for sailing ships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
Abstract
The current ocean exploration mode is transformed to intellectualization and informatization so as to pursue smaller action risk and longer navigation time, the importance of an underwater unmanned exploration system is increasingly highlighted, and path planning considering ocean current factors and obstacle factors becomes a necessary condition for underwater navigation of the AUV. The invention provides an AUV path planning method of a near-end optimization strategy algorithm combined with local perception under the condition that ocean current factors are ignored and local obstacle information is not effectively utilized. A flow chart of AUV underwater path planning is obtained by constructing an underwater ocean current environment, constructing a neural network structure of a near-end optimization strategy, and designing and considering a multi-factor reward function. The method was verified experimentally. The method can be widely applied to real-time path planning of the underwater AUV.
Description
Technical Field
The invention belongs to the field of AUV autonomous path planning, and particularly relates to an AUV path planning method considering ocean current influence and based on local perception and a near-end optimization strategy.
Background
The importance of the underwater unmanned detection system is increasingly highlighted along with the change of the current ocean detection mode to intellectualization and informatization so as to pursue smaller action risk and longer navigation time. The AUV is an important component of an underwater unmanned combat system, the path planning is an important technology for safely and effectively completing combat missions by the AUV, constraint conditions such as ocean currents, obstacle avoidance and self performance need to be considered, and indexes such as energy consumption, navigation time and safety concealment are pursued to be optimal.
The current common path planning methods mainly include a search method based on a directed graph, a heuristic search algorithm, an artificial potential field method, a fast random tree generation method and the like. For the AUV path planning problem in a large-scale area, rapidly obtaining a path meeting the requirement is more important than spending a large amount of time to solve an optimal path, and the reinforcement learning algorithm becomes a hotspot for path planning algorithm research due to the intelligence and dynamic learning capability of the reinforcement learning algorithm.
The reinforcement learning mainly comprises an intelligent agent, an environment, a state, an action and a reward; after the agent has performed an action, the agent will acquire a new observed state for which the state transition context will give a reward signal. The agent then performs a new action according to the current policy based on the reward of the new state and environmental feedback. The intelligent agent continuously optimizes own strategies through reinforcement learning, and finally can take optimal actions in different states. The near-end optimization strategy algorithm belongs to a reinforcement learning algorithm based on a strategy and is used for solving the problem of action selection under a multi-dimensional action space.
Disclosure of Invention
The invention aims to solve the technical problem of providing an AUV path planning method of a near-end optimization strategy algorithm combined with local perception. The technical scheme of the invention is as follows:
1. acquiring obstacle information and ocean current information, and constructing a three-dimensional environment according to the information;
2. a critic network for evaluating the action and an actor network for outputting the action are constructed, and network parameters are initialized.
3. And selecting an action according to the output of the neural network, acquiring a sample, and putting the sample into an experience pool for later learning.
4. The reward function in the sample is calculated as follows:
R d =arctank 1 (ξ t -ξ t+1 -δ d )
in which ξ t Distance, δ, representing the current position of the AUV and the target point d For bias terms, the AUV is harder to obtain a positive reward.
The reward associated with ocean currents is determined by actual speedAnd AUV speedThe ratio of (a) to (b) is set. When the target is reachable and ocean currents have a positive influence on the movement of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate increased ocean current utilization by the AUV. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or utilized to a lesser extent, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is: r = k 1R d +k2*R c
5. The critic network and actor network are learned using the samples, and the updated formula for the actor network is derived as follows:
When merit function estimateAbove 0, the network parameters will be optimized towards increasing the probability of this action output, but to r t (θ) =1+ ε; on the contrary, whenThen the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single-step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =rx +1 +γV β (s t+1 )-V β (s t ).
the updating mode of the critic network is as follows: l is a radical of an alcohol VF β=(V β -V t targ ) 2
6. The parameters are updated for both networks in the manner of 5. And according to the output probability distribution, sampling and selecting actions are carried out. The above sampling and network updating process is repeated until a specified maximum number of rounds is reached. The mark of the end of each round is the maximum number of steps or the target point is reached, and finally the path is output.
The near-end optimization strategy algorithm adopted by the invention comprises two networks: a critic network and an actor network; and the value of the action is evaluated by the comment family network, the actor network is responsible for outputting the action, the sample can be learned for many times, and the on-polarity is converted into the off-polarity, so that the utilization rate of the sample in the experience pool is improved. The input of the invention is the joint input of the relative position information and the description information of the local barrier, and the functions of global guidance and local perception can be provided. And the network can still be converged under the multidimensional motion space by using a mode of outputting probability distribution.
Drawings
FIG. 1 is a block diagram of a method
FIG. 2 results of the experiment
Detailed Description
The method mainly comprises the following steps: input processing, network initialization, reward function design, network updating, decision making and the like. Fig. 1 presents a block diagram of the proposed method.
An AUV path planning method based on a near-end optimization strategy algorithm comprises the following steps:
1. and (5) environment construction. Ocean current and depth data of 122.75 degrees E-130.75 degrees E and 15.25 degrees N-23.625 degrees N are downloaded from a national ocean data center, and the maximum depth is 6400m. A coordinate system is established by taking (122.75 degrees E,15.25 degrees N and 6400 m) as a coordinate origin, the target point is (130.75 degrees E,23.625 degrees N and 6400 m), and the navigational speed of the AUV is 1.5m/s.
2. The state input comprises three parts: position information, ocean current information, local environment information. Wherein the position information is input by adopting relative position coordinates:(g x ,g y ,g z ) The coordinates of a target point, (x, y, z) are coordinates of a current position, ocean current information is obtained according to the current position and is represented as (u, v, w), and local environment information is sensed by a sensor and is converted into a 0,1 matrix. Wherein 0 represents a barrier and 1 represents safety. PerceptionThe range is 3 unit lengths and the perceptual matrix dimensions are 3 x 3.
3. The local perception input is changed into 1 multiplied by 3 input through neural network processing, and is connected with position and ocean current information to serve as final input.
4. Constructing an actor neural network for outputting a strategy, recording the parameter as alpha, finally outputting a 27-dimensional vector through softmax, and sampling according to probability distribution; and constructing a comment family neural network for outputting the value of the action, wherein the parameter of the comment family neural network is recorded as beta, and the rest network structures except the last layer are the same as the actor network.
5. Transmitting the input in 3 into actor network, and outputting action a t AUV executes the current action and gets the new state s under the influence of ocean currents t+1 To obtain a reward r t Storing the current sample in an experience pool(s) t ,a t ,r t ,s t+1 ) This process is repeated until the current round ends. The round end flag is to reach the target or to reach the maximum number of steps 2000.
6. The reward function is set as follows:
R d =arctank 1 (ξ t -ξ t+1 -δ d )
in which ξ t Distance, δ, representing current position of AUV and target point d To bias the items, the AUV is harder to obtain a positive reward.
The reward associated with ocean currents is determined by actual speedAnd AUV speedThe ratio of (c) is set. When the target is reachable and ocean currents have a positive influence on the movement of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate the AUV to make greater use of the ocean current. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or are less utilized, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is: r = k 1R d +k2*R c Wherein k1=1 and k2=0.5.
7. After the round is finished, if the number of samples reaches the designated capacity of 1000, updating is started; if not, sampling is continued. The update formula is as follows:
the critic network and the actor network are learned using the samples, and the updated formula of the actor network is derived as follows:
When the merit function estimatesAbove 0, the network parameters will be optimized towards increasing the probability of output of the action,but is optimized to r t (θ) =1+ ε; on the contrary, whenThen the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =r t+1 +γV β (s t+1 )-V β (s t ).
the updating mode of the critic network is as follows: l is VF β=(V β -V t targ ) 2
Where epsilon is set to 0.3 and the learning rate alpha is 0.001.
The test experiment result is shown in fig. 2, the path length is 610.38km, and the path time is 337413s.
Claims (4)
1. An AUV path planning method based on local perception and near-end optimization strategies, the path planning method comprising:
(1) Obtaining obstacle information and ocean current information, and constructing a three-dimensional environment according to the information;
(2) Constructing a critic network for evaluating the action and an actor network for outputting the action, and initializing network parameters;
(3) Selecting an action according to the output of the neural network, acquiring a sample, and putting the sample into an experience pool for later learning;
(4) Designing a reward function considering a plurality of factors;
(5) Training is performed using samples of the experience pool until a maximum number of rounds is reached, outputting a path.
2. The AUV path planning method based on local perception and near-end optimization strategy as claimed in claim 1, wherein the reward function calculation formula in step (4) is as follows:
R d =arctan k 1 (ξ t -ξ t+1 -δ d )
in which ξ t Distance, δ, representing the current position of the AUV and the target point d To bias the items, the AUV is harder to obtain a positive reward.
The reward associated with ocean currents is determined by actual speedAnd AUV speedThe ratio of (c) is set. When the target is reachable and ocean currents have a positive influence on the motion of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate the AUV to make greater use of the ocean current. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or utilized to a lesser extent, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is set as: r = k 1R d +k2*R c
3. The AUV path planning method based on local perception and near-end optimization strategy as claimed in claim 1, wherein in the step (4), a critics network and an actor network use sample are constructed for learning, and an update formula of the actor network is derived as follows:
When the merit function estimatesAbove 0, the network parameters will be optimized towards increasing the probability of this action output, but to r t (θ) =1+ ε; on the contrary, whenThen the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single-step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =r t+1 +γV β (s t+1 )-V β (s t ).
4. the AUV path planning method based on local awareness and near-end optimization strategy as claimed in claim 1, wherein the step (5) updates parameters of two networks. And sampling and selecting according to the output probability distribution. The above sampling and network updating process is repeated until a specified maximum number of rounds is reached. The end of each round is marked by the maximum number of steps reached or the target point reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211219574.9A CN115493595A (en) | 2022-09-28 | 2022-09-28 | AUV path planning method based on local perception and near-end optimization strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211219574.9A CN115493595A (en) | 2022-09-28 | 2022-09-28 | AUV path planning method based on local perception and near-end optimization strategy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115493595A true CN115493595A (en) | 2022-12-20 |
Family
ID=84472697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211219574.9A Pending CN115493595A (en) | 2022-09-28 | 2022-09-28 | AUV path planning method based on local perception and near-end optimization strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115493595A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN111829527A (en) * | 2020-07-23 | 2020-10-27 | 中国石油大学(华东) | Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements |
CN112241176A (en) * | 2020-10-16 | 2021-01-19 | 哈尔滨工程大学 | Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment |
CN112698646A (en) * | 2020-12-05 | 2021-04-23 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN113159432A (en) * | 2021-04-28 | 2021-07-23 | 杭州电子科技大学 | Multi-agent path planning method based on deep reinforcement learning |
CN113532457A (en) * | 2021-06-07 | 2021-10-22 | 山东师范大学 | Robot path navigation method, system, device and storage medium |
CN113534668A (en) * | 2021-08-13 | 2021-10-22 | 哈尔滨工程大学 | Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework |
-
2022
- 2022-09-28 CN CN202211219574.9A patent/CN115493595A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN111829527A (en) * | 2020-07-23 | 2020-10-27 | 中国石油大学(华东) | Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements |
CN112241176A (en) * | 2020-10-16 | 2021-01-19 | 哈尔滨工程大学 | Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment |
CN112698646A (en) * | 2020-12-05 | 2021-04-23 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN113159432A (en) * | 2021-04-28 | 2021-07-23 | 杭州电子科技大学 | Multi-agent path planning method based on deep reinforcement learning |
CN113532457A (en) * | 2021-06-07 | 2021-10-22 | 山东师范大学 | Robot path navigation method, system, device and storage medium |
CN113534668A (en) * | 2021-08-13 | 2021-10-22 | 哈尔滨工程大学 | Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework |
Non-Patent Citations (1)
Title |
---|
JIACHEN YANG,ET AL: "A Time-Saving Path Planning Scheme for Autonomous Underwater Vehicles With Complex Underwater Conditions", 《IEEE INTERNET OF THINGS JOURNAL》, 12 September 2022 (2022-09-12), pages 1001 - 1013 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge | |
CN111142522B (en) | Method for controlling agent of hierarchical reinforcement learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
CN111780777B (en) | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning | |
CN107168324B (en) | Robot path planning method based on ANFIS fuzzy neural network | |
WO2018120739A1 (en) | Path planning method, apparatus and robot | |
CN109655066A (en) | One kind being based on the unmanned plane paths planning method of Q (λ) algorithm | |
CN110750096A (en) | Mobile robot collision avoidance planning method based on deep reinforcement learning in static environment | |
Botteghi et al. | On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach | |
Xie et al. | Learning with stochastic guidance for robot navigation | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN113268074A (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN115248591B (en) | UUV path planning method based on mixed initialization wolf particle swarm algorithm | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
Yan et al. | A novel path planning for AUV based on objects’ motion parameters predication | |
Song et al. | Autonomous mobile robot navigation using machine learning | |
Jacinto et al. | Navigation of autonomous vehicles using reinforcement learning with generalized advantage estimation | |
CN116448119A (en) | Unmanned swarm collaborative flight path planning method for sudden threat | |
CN115493595A (en) | AUV path planning method based on local perception and near-end optimization strategy | |
CN114740873B (en) | Path planning method of autonomous underwater robot based on multi-target improved particle swarm algorithm | |
Zhang et al. | Visual navigation of mobile robots in complex environments based on distributed deep reinforcement learning | |
CN113959446B (en) | Autonomous logistics transportation navigation method for robot based on neural network | |
Duo et al. | A deep reinforcement learning based mapless navigation algorithm using continuous actions | |
Shengjun et al. | Improved artificial bee colony algorithm based optimal navigation path for mobile robot | |
Martin et al. | The application of particle swarm optimization and maneuver automatons during non-Markovian motion planning for air vehicles performing ground target search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |