CN115493595A - AUV path planning method based on local perception and near-end optimization strategy - Google Patents

AUV path planning method based on local perception and near-end optimization strategy Download PDF

Info

Publication number
CN115493595A
CN115493595A CN202211219574.9A CN202211219574A CN115493595A CN 115493595 A CN115493595 A CN 115493595A CN 202211219574 A CN202211219574 A CN 202211219574A CN 115493595 A CN115493595 A CN 115493595A
Authority
CN
China
Prior art keywords
auv
network
path planning
optimization strategy
ocean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211219574.9A
Other languages
Chinese (zh)
Inventor
杨嘉琛
霍紫强
霍佳明
肖帅
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202211219574.9A priority Critical patent/CN115493595A/en
Publication of CN115493595A publication Critical patent/CN115493595A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/203Specially adapted for sailing ships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The current ocean exploration mode is transformed to intellectualization and informatization so as to pursue smaller action risk and longer navigation time, the importance of an underwater unmanned exploration system is increasingly highlighted, and path planning considering ocean current factors and obstacle factors becomes a necessary condition for underwater navigation of the AUV. The invention provides an AUV path planning method of a near-end optimization strategy algorithm combined with local perception under the condition that ocean current factors are ignored and local obstacle information is not effectively utilized. A flow chart of AUV underwater path planning is obtained by constructing an underwater ocean current environment, constructing a neural network structure of a near-end optimization strategy, and designing and considering a multi-factor reward function. The method was verified experimentally. The method can be widely applied to real-time path planning of the underwater AUV.

Description

AUV path planning method based on local perception and near-end optimization strategy
Technical Field
The invention belongs to the field of AUV autonomous path planning, and particularly relates to an AUV path planning method considering ocean current influence and based on local perception and a near-end optimization strategy.
Background
The importance of the underwater unmanned detection system is increasingly highlighted along with the change of the current ocean detection mode to intellectualization and informatization so as to pursue smaller action risk and longer navigation time. The AUV is an important component of an underwater unmanned combat system, the path planning is an important technology for safely and effectively completing combat missions by the AUV, constraint conditions such as ocean currents, obstacle avoidance and self performance need to be considered, and indexes such as energy consumption, navigation time and safety concealment are pursued to be optimal.
The current common path planning methods mainly include a search method based on a directed graph, a heuristic search algorithm, an artificial potential field method, a fast random tree generation method and the like. For the AUV path planning problem in a large-scale area, rapidly obtaining a path meeting the requirement is more important than spending a large amount of time to solve an optimal path, and the reinforcement learning algorithm becomes a hotspot for path planning algorithm research due to the intelligence and dynamic learning capability of the reinforcement learning algorithm.
The reinforcement learning mainly comprises an intelligent agent, an environment, a state, an action and a reward; after the agent has performed an action, the agent will acquire a new observed state for which the state transition context will give a reward signal. The agent then performs a new action according to the current policy based on the reward of the new state and environmental feedback. The intelligent agent continuously optimizes own strategies through reinforcement learning, and finally can take optimal actions in different states. The near-end optimization strategy algorithm belongs to a reinforcement learning algorithm based on a strategy and is used for solving the problem of action selection under a multi-dimensional action space.
Disclosure of Invention
The invention aims to solve the technical problem of providing an AUV path planning method of a near-end optimization strategy algorithm combined with local perception. The technical scheme of the invention is as follows:
1. acquiring obstacle information and ocean current information, and constructing a three-dimensional environment according to the information;
2. a critic network for evaluating the action and an actor network for outputting the action are constructed, and network parameters are initialized.
3. And selecting an action according to the output of the neural network, acquiring a sample, and putting the sample into an experience pool for later learning.
4. The reward function in the sample is calculated as follows:
R d =arctank 1tt+1d )
in which ξ t Distance, δ, representing the current position of the AUV and the target point d For bias terms, the AUV is harder to obtain a positive reward.
Figure BDA0003870365280000011
Figure BDA0003870365280000012
Figure BDA0003870365280000013
The reward associated with ocean currents is determined by actual speed
Figure BDA0003870365280000014
And AUV speed
Figure BDA0003870365280000015
The ratio of (a) to (b) is set. When the target is reachable and ocean currents have a positive influence on the movement of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate increased ocean current utilization by the AUV. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or utilized to a lesser extent, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is: r = k 1R d +k2*R c
5. The critic network and actor network are learned using the samples, and the updated formula for the actor network is derived as follows:
the objective function is:
Figure BDA0003870365280000021
the gradient of the objective function is:
Figure BDA0003870365280000022
updating the formula: is composed of
Figure BDA0003870365280000023
Alpha is the learning rate
The actor network is updated in the way of
Figure BDA0003870365280000024
When merit function estimate
Figure BDA0003870365280000025
Above 0, the network parameters will be optimized towards increasing the probability of this action output, but to r t (θ) =1+ ε; on the contrary, when
Figure BDA0003870365280000026
Then the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single-step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =rx +1 +γV β (s t+1 )-V β (s t ).
the merit function is:
Figure BDA0003870365280000027
the updating mode of the critic network is as follows: l is a radical of an alcohol VF β=(V β -V t targ ) 2
6. The parameters are updated for both networks in the manner of 5. And according to the output probability distribution, sampling and selecting actions are carried out. The above sampling and network updating process is repeated until a specified maximum number of rounds is reached. The mark of the end of each round is the maximum number of steps or the target point is reached, and finally the path is output.
The near-end optimization strategy algorithm adopted by the invention comprises two networks: a critic network and an actor network; and the value of the action is evaluated by the comment family network, the actor network is responsible for outputting the action, the sample can be learned for many times, and the on-polarity is converted into the off-polarity, so that the utilization rate of the sample in the experience pool is improved. The input of the invention is the joint input of the relative position information and the description information of the local barrier, and the functions of global guidance and local perception can be provided. And the network can still be converged under the multidimensional motion space by using a mode of outputting probability distribution.
Drawings
FIG. 1 is a block diagram of a method
FIG. 2 results of the experiment
Detailed Description
The method mainly comprises the following steps: input processing, network initialization, reward function design, network updating, decision making and the like. Fig. 1 presents a block diagram of the proposed method.
An AUV path planning method based on a near-end optimization strategy algorithm comprises the following steps:
1. and (5) environment construction. Ocean current and depth data of 122.75 degrees E-130.75 degrees E and 15.25 degrees N-23.625 degrees N are downloaded from a national ocean data center, and the maximum depth is 6400m. A coordinate system is established by taking (122.75 degrees E,15.25 degrees N and 6400 m) as a coordinate origin, the target point is (130.75 degrees E,23.625 degrees N and 6400 m), and the navigational speed of the AUV is 1.5m/s.
2. The state input comprises three parts: position information, ocean current information, local environment information. Wherein the position information is input by adopting relative position coordinates:
Figure BDA0003870365280000031
(g x ,g y ,g z ) The coordinates of a target point, (x, y, z) are coordinates of a current position, ocean current information is obtained according to the current position and is represented as (u, v, w), and local environment information is sensed by a sensor and is converted into a 0,1 matrix. Wherein 0 represents a barrier and 1 represents safety. PerceptionThe range is 3 unit lengths and the perceptual matrix dimensions are 3 x 3.
3. The local perception input is changed into 1 multiplied by 3 input through neural network processing, and is connected with position and ocean current information to serve as final input.
4. Constructing an actor neural network for outputting a strategy, recording the parameter as alpha, finally outputting a 27-dimensional vector through softmax, and sampling according to probability distribution; and constructing a comment family neural network for outputting the value of the action, wherein the parameter of the comment family neural network is recorded as beta, and the rest network structures except the last layer are the same as the actor network.
5. Transmitting the input in 3 into actor network, and outputting action a t AUV executes the current action and gets the new state s under the influence of ocean currents t+1 To obtain a reward r t Storing the current sample in an experience pool(s) t ,a t ,r t ,s t+1 ) This process is repeated until the current round ends. The round end flag is to reach the target or to reach the maximum number of steps 2000.
6. The reward function is set as follows:
R d =arctank 1tt+1d )
in which ξ t Distance, δ, representing current position of AUV and target point d To bias the items, the AUV is harder to obtain a positive reward.
Figure BDA0003870365280000032
Figure BDA0003870365280000033
Figure BDA0003870365280000034
The reward associated with ocean currents is determined by actual speed
Figure BDA0003870365280000035
And AUV speed
Figure BDA0003870365280000036
The ratio of (c) is set. When the target is reachable and ocean currents have a positive influence on the movement of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate the AUV to make greater use of the ocean current. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or are less utilized, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is: r = k 1R d +k2*R c Wherein k1=1 and k2=0.5.
7. After the round is finished, if the number of samples reaches the designated capacity of 1000, updating is started; if not, sampling is continued. The update formula is as follows:
the critic network and the actor network are learned using the samples, and the updated formula of the actor network is derived as follows:
the objective function is:
Figure BDA0003870365280000041
the gradient of the objective function is:
Figure BDA0003870365280000042
updating the formula: is composed of
Figure BDA0003870365280000043
Alpha is the learning rate
The actor network is updated in the way of
Figure BDA0003870365280000044
When the merit function estimates
Figure BDA0003870365280000045
Above 0, the network parameters will be optimized towards increasing the probability of output of the action,but is optimized to r t (θ) =1+ ε; on the contrary, when
Figure BDA0003870365280000046
Then the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =r t+1 +γV β (s t+1 )-V β (s t ).
the merit function is:
Figure BDA0003870365280000047
the updating mode of the critic network is as follows: l is VF β=(V β -V t targ ) 2
Where epsilon is set to 0.3 and the learning rate alpha is 0.001.
The test experiment result is shown in fig. 2, the path length is 610.38km, and the path time is 337413s.

Claims (4)

1. An AUV path planning method based on local perception and near-end optimization strategies, the path planning method comprising:
(1) Obtaining obstacle information and ocean current information, and constructing a three-dimensional environment according to the information;
(2) Constructing a critic network for evaluating the action and an actor network for outputting the action, and initializing network parameters;
(3) Selecting an action according to the output of the neural network, acquiring a sample, and putting the sample into an experience pool for later learning;
(4) Designing a reward function considering a plurality of factors;
(5) Training is performed using samples of the experience pool until a maximum number of rounds is reached, outputting a path.
2. The AUV path planning method based on local perception and near-end optimization strategy as claimed in claim 1, wherein the reward function calculation formula in step (4) is as follows:
R d =arctan k 1tt+1d )
in which ξ t Distance, δ, representing the current position of the AUV and the target point d To bias the items, the AUV is harder to obtain a positive reward.
Figure FDA0003870365270000011
Figure FDA0003870365270000012
Figure FDA0003870365270000013
The reward associated with ocean currents is determined by actual speed
Figure FDA0003870365270000014
And AUV speed
Figure FDA0003870365270000015
The ratio of (c) is set. When the target is reachable and ocean currents have a positive influence on the motion of the AUV, the actual speed should be greater than the speed of the AUV. Parameter tau c Typically set to 0.5 to facilitate the AUV to make greater use of the ocean current. R c Decreasing with increasing ocean current angle and increasing with increasing ocean current velocity. When ocean currents are negatively affected or utilized to a lesser extent, the formula tends to pass through δ c Punishment is made to the agent.
The final reward function is set as: r = k 1R d +k2*R c
3. The AUV path planning method based on local perception and near-end optimization strategy as claimed in claim 1, wherein in the step (4), a critics network and an actor network use sample are constructed for learning, and an update formula of the actor network is derived as follows:
the objective function is:
Figure FDA0003870365270000016
the gradient of the objective function is:
Figure FDA0003870365270000017
updating the formula: is composed of
Figure FDA0003870365270000018
Alpha is the learning rate
The actor network is updated in the way of
Figure FDA0003870365270000019
When the merit function estimates
Figure FDA00038703652700000110
Above 0, the network parameters will be optimized towards increasing the probability of this action output, but to r t (θ) =1+ ε; on the contrary, when
Figure FDA00038703652700000111
Then the network parameters will be optimized towards the probability direction of decreasing the action until r t (θ) =1- ε, essentially controlling the magnitude of policy updates.
In the near-end optimization strategy, the merit function is estimated using the timing difference error, and a single-step TD-error is defined as the difference between the cumulative discount reward and the Critic network state estimate. The estimate of the merit function is N steps TD-error, expressed as:
δ t =r t+1 +γV β (s t+1 )-V β (s t ).
the merit function is:
Figure FDA0003870365270000021
the updating mode of the critic network is as follows:
Figure FDA0003870365270000022
4. the AUV path planning method based on local awareness and near-end optimization strategy as claimed in claim 1, wherein the step (5) updates parameters of two networks. And sampling and selecting according to the output probability distribution. The above sampling and network updating process is repeated until a specified maximum number of rounds is reached. The end of each round is marked by the maximum number of steps reached or the target point reached.
CN202211219574.9A 2022-09-28 2022-09-28 AUV path planning method based on local perception and near-end optimization strategy Pending CN115493595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211219574.9A CN115493595A (en) 2022-09-28 2022-09-28 AUV path planning method based on local perception and near-end optimization strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211219574.9A CN115493595A (en) 2022-09-28 2022-09-28 AUV path planning method based on local perception and near-end optimization strategy

Publications (1)

Publication Number Publication Date
CN115493595A true CN115493595A (en) 2022-12-20

Family

ID=84472697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211219574.9A Pending CN115493595A (en) 2022-09-28 2022-09-28 AUV path planning method based on local perception and near-end optimization strategy

Country Status (1)

Country Link
CN (1) CN115493595A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN113159432A (en) * 2021-04-28 2021-07-23 杭州电子科技大学 Multi-agent path planning method based on deep reinforcement learning
CN113532457A (en) * 2021-06-07 2021-10-22 山东师范大学 Robot path navigation method, system, device and storage medium
CN113534668A (en) * 2021-08-13 2021-10-22 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN111829527A (en) * 2020-07-23 2020-10-27 中国石油大学(华东) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN113159432A (en) * 2021-04-28 2021-07-23 杭州电子科技大学 Multi-agent path planning method based on deep reinforcement learning
CN113532457A (en) * 2021-06-07 2021-10-22 山东师范大学 Robot path navigation method, system, device and storage medium
CN113534668A (en) * 2021-08-13 2021-10-22 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIACHEN YANG,ET AL: "A Time-Saving Path Planning Scheme for Autonomous Underwater Vehicles With Complex Underwater Conditions", 《IEEE INTERNET OF THINGS JOURNAL》, 12 September 2022 (2022-09-12), pages 1001 - 1013 *

Similar Documents

Publication Publication Date Title
Jiang et al. Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge
CN111142522B (en) Method for controlling agent of hierarchical reinforcement learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN111780777B (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN107168324B (en) Robot path planning method based on ANFIS fuzzy neural network
WO2018120739A1 (en) Path planning method, apparatus and robot
CN109655066A (en) One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
CN110750096A (en) Mobile robot collision avoidance planning method based on deep reinforcement learning in static environment
Botteghi et al. On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach
Xie et al. Learning with stochastic guidance for robot navigation
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN113268074A (en) Unmanned aerial vehicle flight path planning method based on joint optimization
CN115248591B (en) UUV path planning method based on mixed initialization wolf particle swarm algorithm
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Yan et al. A novel path planning for AUV based on objects’ motion parameters predication
Song et al. Autonomous mobile robot navigation using machine learning
Jacinto et al. Navigation of autonomous vehicles using reinforcement learning with generalized advantage estimation
CN116448119A (en) Unmanned swarm collaborative flight path planning method for sudden threat
CN115493595A (en) AUV path planning method based on local perception and near-end optimization strategy
CN114740873B (en) Path planning method of autonomous underwater robot based on multi-target improved particle swarm algorithm
Zhang et al. Visual navigation of mobile robots in complex environments based on distributed deep reinforcement learning
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
Duo et al. A deep reinforcement learning based mapless navigation algorithm using continuous actions
Shengjun et al. Improved artificial bee colony algorithm based optimal navigation path for mobile robot
Martin et al. The application of particle swarm optimization and maneuver automatons during non-Markovian motion planning for air vehicles performing ground target search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination