CN111198568A - Underwater robot obstacle avoidance control method based on Q learning - Google Patents

Underwater robot obstacle avoidance control method based on Q learning Download PDF

Info

Publication number
CN111198568A
CN111198568A CN201911338069.4A CN201911338069A CN111198568A CN 111198568 A CN111198568 A CN 111198568A CN 201911338069 A CN201911338069 A CN 201911338069A CN 111198568 A CN111198568 A CN 111198568A
Authority
CN
China
Prior art keywords
underwater robot
penalty
robot
action
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911338069.4A
Other languages
Chinese (zh)
Inventor
闫敬
李文飚
杨晛
罗小元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN201911338069.4A priority Critical patent/CN111198568A/en
Publication of CN111198568A publication Critical patent/CN111198568A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/04Control of altitude or depth
    • G05D1/06Rate of change of altitude or depth
    • G05D1/0692Rate of change of altitude or depth specially adapted for under-water vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S15/00Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
    • G01S15/88Sonar systems specially adapted for specific applications
    • G01S15/93Sonar systems specially adapted for specific applications for anti-collision purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Manipulator (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an underwater robot obstacle avoidance control method based on Q learning, which belongs to the field of underwater robot control and mainly comprises the following steps: establishing a current environment according to sonar devices arranged around the underwater robot; setting a safety alert distance and a target threshold range of the underwater robot, and determining the position of the underwater robot in real time by adopting a positioning technology; creating an action space, a neural network, initializing action rewarding and penalizing, a state space and an iteration value; setting a reward and penalty mechanism, selecting each action according to a reward and penalty function, and enabling the actions to reach a convergence requirement through an iteration Q function to approach a target; the neural network approximation is adopted to improve the efficiency, and the gradient descent method is used for iteration. The invention has the advantages of improving the response capability and the learning capability of the underwater robot, having high data utilization rate, reducing errors and the like.

Description

Underwater robot obstacle avoidance control method based on Q learning
Technical Field
The invention belongs to the technical field of underwater robot control, in particular to optimal control for timely avoiding underwater obstacles, and particularly relates to an underwater robot obstacle avoidance control method based on Q learning.
Background
The ocean accounts for about 71% of the earth's surface and will become a new exploration space for human beings. The underwater robot senses the obstacles through a specific sensor to avoid the obstacles. However, the characteristics of marine environments are very complex, such as reef, coral, sea ditch and even marine emergencies (fast gathering fish shoal), so that it is very important that the underwater robot can avoid obstacles smoothly during exploration.
The patent application with the publication number of CN107121985A discloses a radar obstacle avoidance system of an underwater intelligent robot, and the scheme takes a radar transceiver as a main carrier and combines a timer of a single chip microcomputer to effectively avoid obstacles. Although the method can complete obstacle avoidance work of the underwater robot, the radar transmission mode is mainly electromagnetic waves, the electromagnetic waves are quickly attenuated when being transmitted underwater, and received signals are weakened, so that the obstacle avoidance is not timely, and further the robot is collided.
Furthermore, patent application with publication number CN108829134A discloses a real-time autonomous obstacle avoidance method for a deep-sea robot, which uses a geometric sphere to model irregular obstacles, projects the obstacles on a horizontal plane and a vertical plane, and analyzes a course infeasible area affected by the obstacles by adopting a tangent method to obtain a course infeasible set navigated by an unmanned underwater vehicle; analyzing the motion characteristics of the unmanned underwater vehicle to obtain a heading window and a linear velocity window of the unmanned underwater vehicle; searching out an optimal navigation angle by constructing an optimal navigation angle optimization function, and constructing a leading line speed model according to the distribution of the obstacles and the size of a yaw angle; and finally, outputting the navigation angle and the navigation linear speed to an unmanned underwater vehicle motion control module to guide the unmanned underwater vehicle to realize real-time obstacle avoidance in a three-dimensional environment. However, the above method is complicated and time-consuming in analysis and calculation, and cannot cope with seabed emergencies such as fish swarms moving everywhere. Therefore, it is necessary to design an obstacle avoidance control method for an underwater robot, which considers timeliness and has strong adaptability, so that on one hand, seabed emergencies can be avoided in time, and on the other hand, the obstacle avoidance control method can adapt to various complex seabed conditions.
Disclosure of Invention
The invention aims to provide an underwater robot obstacle avoidance control method which is timely in avoidance, strong in adaptability and wide in application.
In order to achieve the purpose, the invention adopts the technical scheme that:
an underwater robot obstacle avoidance control method based on Q learning comprises the following steps:
step 1, establishing the current environment of a robot through signals of a sonar receiving device arranged on an underwater robot; the underwater robot adopts a dynamic model of
Figure BDA0002331511290000021
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output;
the underwater robot has 6 degrees of freedom, and the distance between the robot and the obstacle is x on the assumption of the nth degree of freedomnThe underwater robot sets a safety warning distance d, if the underwater robot has x in the nth degree of freedomn<d, indicating that the underwater robot is likely to collide and simultaneously taking corresponding evasive action on the degree of freedom;
step 2, determining the position D of the underwater robot at each moment by using a positioning technologyiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point, calculating the distance D between the underwater robot and the target point at the current moment, setting a target point threshold value D0 by considering the underwater fluctuation, and if D is detected<d0, indicating that the underwater robot has reached the target point; establishing an action space A according to the degree of freedom of the underwater robot;
step 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward and penalty mechanism, setting an initial penalty as K, and in step 1, selecting action to punish minimum by the underwater robot, wherein the underwater robot sets the initial penalty as KDistance rewarding function R from target point1The following formula is given,
Figure BDA0002331511290000031
i.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given, and in step 2, the underwater robot approaches the reward function R of the obstacle within the security alert threshold2Is given by
Figure BDA0002331511290000032
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K, and the total reward and penalty of each step of the underwater robot is R ═ R1+R2(ii) a Meanwhile, the underwater robot avoids the obstacle according to a reward penalty function, when the penalty of the step is larger than that of the previous step, the underwater robot is close to the obstacle, and the underwater robot moves away from the obstacle; when the penalty of the step is smaller than that of the previous step, the underwater robot is far away from the barrier, and the underwater robot moves to a target point;
step 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, copying the actual network weight into the target network weight after each training, wherein the weight updating formula is as follows
Figure BDA0002331511290000033
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output for the neuron;
step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j to represent the number of times of robot training; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'; for the Q value that seeks the maximum, perform gradient descent, in order to minimize the punishment of each step; inputting the updated state of each step into a Q learning network, and then returning the Q values of all possible actions in the state; selecting an action, selecting a random action a when Q values of all selected actions are the same, and selecting the action with the highest Q value when the Q values of all selected actions are different; after the action a is selected, the underwater robot executes the selected action in the state S, and moves to a new state S', and receives the prize R; these steps M rounds are repeated until the Q value meets the convergence requirement.
The technical scheme of the invention is further improved as follows: in step 2, the target point threshold range is a circular area with d0 as a radius and the target point as a center.
The technical scheme of the invention is further improved as follows: in step 1, the safety alert range is a circular moving area with d as a radius and the center of mass of the underwater robot as the center of a circle.
The technical scheme of the invention is further improved as follows: the convergence requirement of the Q value in the step 5 is that the difference between the Q value in the step and the Q value in the previous step is not more than 0.01, namely the Q value reaches the convergence.
Due to the adoption of the technical scheme, the invention has the following technical effects:
1. aiming at the seabed emergency, the method of the invention is respectively provided with distance measuring sonar and forward looking sonar at the bow, the stern, the port and the starboard of the robot, thus being capable of timely measuring the surrounding obstacle situation to effectively avoid.
2. Aiming at the complex topography condition of the seabed, the weight distribution is carried out on the input by utilizing the neural network, the weight distribution is combined with Q learning, the weight is updated by utilizing experience every time, and the data utilization efficiency is higher. Learning directly from consecutive samples is insufficient due to the high correlation between consecutive samples. The method of combining the neural network to randomly draw the samples breaks the correlation, so that the variance of the updated weight can be reduced.
3. The method is also provided with a target network independently to process TD deviation in a time difference algorithm. Therefore, the method has strong learning ability and fast environment adaptation, and can be better competent for complex tasks when being applied to obstacle avoidance control of the underwater robot.
4. In the method, the underwater robot selects actions to punish minimum by using Q learning, sets a per-step reward-penalty mechanism, and enables the underwater robot to avoid obstacles more accurately and reasonably by setting a reasonable per-step total reward-penalty function.
Drawings
FIG. 1 is a flow chart of an underwater robot learning process;
FIG. 2 is a schematic diagram of an obstacle avoidance of the underwater robot on a simulated seabed;
in fig. 2: u is an underwater robot; g is a target point; x is an obstacle; 1, 2, 3, 4, and robot learning training.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific embodiments:
as shown in fig. 1 and 2, the invention discloses an underwater robot obstacle avoidance control method based on Q learning, which is mainly an autonomous and cableless underwater robot U, senses the surrounding environment through a sonar receiving device around the autonomous and cableless underwater robot U, and can perform underwater autonomous obstacle avoidance by using a self-control system. The method takes timeliness into consideration and has strong adaptability.
The obstacle avoidance method comprises the following steps:
step 1, establishing the current environment of the robot through signals of a sonar receiving device arranged on the underwater robot. The underwater robot adopts a dynamic model of
Figure BDA0002331511290000051
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output.
The underwater robot has 6 degrees of freedom. Suppose that the distance between the robot and the obstacle is x in the nth degree of freedomn. The safety warning distance set by the underwater robot is d, wherein the safety warning range refers to a circular moving area which takes d as a radius and takes the center of mass of the underwater robot as the center of a circle, and if the underwater robot has x in the nth degree of freedomn<d, the underwater robot is possible to collide, and corresponding evasive action is taken on the degree of freedom.
Step 2, determining the position D of the underwater robot at each moment by using a positioning technologyiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point. Calculating the distance D between the underwater robot and a target point at the current moment, and setting a target point threshold value D0 by considering the underwater fluctuation, wherein the target point threshold value range is a circular area taking D0 as a radius and taking the target point as a circle center; if D is<d0, it indicates that the underwater robot has reached the target point. And establishing an action space A according to the degree of freedom of the underwater robot.
And 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward and penalty mechanism, and setting an initial penalty as K. In step 1, a distance reward function R between the underwater robot and a target point1Is given by
Figure BDA0002331511290000061
I.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given. In step 2, the underwater robot is safeReward function R close to obstacle within warning threshold2Is given by
Figure BDA0002331511290000062
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K. The total reward per step of the underwater robot is R ═ R1+R2. And meanwhile, action selection is carried out by combining a reward and penalty mechanism of the underwater robot. By setting the reward and penalty mechanism and giving a cost function, the target is approached by iteratively seeking the minimum penalty.
And 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, and copying the actual network weight to the target network weight after each training. The weight update is as follows
Figure BDA0002331511290000071
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output by the neuron.
Step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'. Gradient descent is performed to minimize per-step penalty for seeking the maximum Q value. The updated state for each step is input into the Q learning network and then the Q values for all possible actions in that state are returned. At this point, an action is selected, and when the Q value of each selected action is the same, we select a random action a, and when the Q value of each selected action is different, the action with the highest Q value is selected. After selecting action a, the underwater robot performs the selected action in state S and proceeds to a new state S', receiving the prize R. These steps M rounds are repeated until the Q value meets the convergence requirement. In a specific operation, the requirement for convergence of the Q value is that the Q value of the step is not more than 0.01 different from the Q value of the previous step, i.e. the Q value has reached convergence. The neural network approximation is adopted to improve the efficiency, and the gradient descent method is utilized to iterate so as to seek the optimal control strategy.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications may be made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.

Claims (4)

1. An underwater robot obstacle avoidance control method based on Q learning is characterized in that: the method comprises the following steps:
step 1, establishing the current environment of a robot through signals of a sonar receiving device arranged on an underwater robot; the underwater robot adopts a dynamic model of
Figure FDA0002331511280000011
Wherein M represents an inertia matrix, C represents a Coriolis force matrix, D represents a damping matrix, G represents a gravity matrix, τ is a control input, and v is a control output;
the underwater robot has 6 degrees of freedom, and the distance between the robot and the obstacle is x on the assumption of the nth degree of freedomnThe underwater robot sets a safety warning distance d, if the underwater robot has x in the nth degree of freedomn<d, indicating that the underwater robot is likely to collide and simultaneously taking corresponding evasive action on the degree of freedom;
step 2, determining the underwater robot at each moment by using a positioning technologyPosition D ofiWherein i represents the ith moment, and the distance D between the underwater robot and the target point at the moment is comparediAnd the distance D between the underwater robot and the target point at the last momenti-1If D isi>Di-1Indicating that the robot is moving away from the target point, if Di<Di-1Indicating that the robot is approaching the target point, calculating the distance D between the underwater robot and the target point at the current moment, setting a target point threshold value D0 by considering the underwater fluctuation, and if D is detected<d0, indicating that the underwater robot has reached the target point; establishing an action space A according to the degree of freedom of the underwater robot;
step 3, selecting action to punish minimum according to Q learning of the underwater robot, setting a per-step reward-penalty mechanism, setting an initial penalty as K, and in step 1, selecting a distance reward-penalty function R between the underwater robot and a target point1The following formula is given,
Figure FDA0002331511280000012
i.e. the occurrence of Di>Di-1Then a penalty K is given and D appearsi<Di-1Then a negative penalty-K is given, and in step 2, the underwater robot approaches the reward function R of the obstacle within the security alert threshold2Is given by
Figure FDA0002331511280000021
Wherein the above formula indicates that when the obstacle enters the safety warning distance, the reward and penalty function value increases along with the decrease of the distance that the underwater robot approaches the obstacle; when the barrier is out of the safety warning distance, the reward and penalty function value is K, and the total reward and penalty of each step of the underwater robot is R ═ R1+R2(ii) a Meanwhile, the underwater robot avoids the obstacle according to a reward penalty function, when the penalty of the step is larger than that of the previous step, the underwater robot is close to the obstacle, and the underwater robot moves away from the obstacle; when the step penalty becomes smaller than the previous step penalty, tableThe underwater robot is shown to be away from the obstacle, and the underwater robot needs to move to a target point;
step 4, carrying out weight distribution on the multidimensional input by utilizing the neural network, copying the actual network weight into the target network weight after each training, wherein the weight updating formula is as follows
Figure FDA0002331511280000022
Wherein xmFor input signals, ωmRepresenting the weight, M being the total number of neurons, netlIs the relation of input and output, f is the activation function, ylIs output for the neuron;
step 5, training the underwater robot to search an optimal obstacle avoidance path scheme, and initializing an action rewarding penalty R; initializing a state matrix S; initializing the total training times M of the robot; setting an iteration value j to represent the number of times of robot training; setting a discount factor gamma; according to the Q function
Q(s,a)=R(s,a)+γmaxaQ(s',a') (5)
The above equation represents the reward penalty R (s, a) for taking the action a in one state s plus the highest Q value at the discount rate for the next state s'; for the Q value that seeks the maximum, perform gradient descent, in order to minimize the punishment of each step; inputting the updated state of each step into a Q learning network, and then returning the Q values of all possible actions in the state; selecting an action, selecting a random action a when Q values of all selected actions are the same, and selecting the action with the highest Q value when the Q values of all selected actions are different; after the action a is selected, the underwater robot executes the selected action in the state S, and moves to a new state S', and receives the prize R; these steps M rounds are repeated until the Q value meets the convergence requirement.
2. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: in step 2, the target point threshold range is a circular area with d0 as a radius and the target point as a center.
3. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: in step 1, the safety alert range is a circular moving area with d as a radius and the center of mass of the underwater robot as the center of a circle.
4. The Q learning-based underwater robot obstacle avoidance control method according to claim 1, characterized in that: the convergence requirement of the Q value in the step 5 is that the difference between the Q value in the step and the Q value in the previous step is not more than 0.01, namely the Q value reaches the convergence.
CN201911338069.4A 2019-12-23 2019-12-23 Underwater robot obstacle avoidance control method based on Q learning Pending CN111198568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911338069.4A CN111198568A (en) 2019-12-23 2019-12-23 Underwater robot obstacle avoidance control method based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911338069.4A CN111198568A (en) 2019-12-23 2019-12-23 Underwater robot obstacle avoidance control method based on Q learning

Publications (1)

Publication Number Publication Date
CN111198568A true CN111198568A (en) 2020-05-26

Family

ID=70744597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911338069.4A Pending CN111198568A (en) 2019-12-23 2019-12-23 Underwater robot obstacle avoidance control method based on Q learning

Country Status (1)

Country Link
CN (1) CN111198568A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102795323A (en) * 2011-05-25 2012-11-28 中国科学院沈阳自动化研究所 Unscented Kalman filter (UKF)-based underwater robot state and parameter joint estimation method
CN109240091A (en) * 2018-11-13 2019-01-18 燕山大学 A kind of underwater robot control method based on intensified learning and its control method tracked
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN109933086A (en) * 2019-03-14 2019-06-25 天津大学 Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102795323A (en) * 2011-05-25 2012-11-28 中国科学院沈阳自动化研究所 Unscented Kalman filter (UKF)-based underwater robot state and parameter joint estimation method
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN109240091A (en) * 2018-11-13 2019-01-18 燕山大学 A kind of underwater robot control method based on intensified learning and its control method tracked
CN109933086A (en) * 2019-03-14 2019-06-25 天津大学 Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning

Similar Documents

Publication Publication Date Title
CN109540151B (en) AUV three-dimensional path planning method based on reinforcement learning
Sun et al. Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning
JP6854549B2 (en) AUV action planning and motion control methods based on reinforcement learning
CN109765929B (en) UUV real-time obstacle avoidance planning method based on improved RNN
CN108803313B (en) Path planning method based on ocean current prediction model
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
CN111273670B (en) Unmanned ship collision prevention method for fast moving obstacle
CN109241552A (en) A kind of underwater robot motion planning method based on multiple constraint target
Wang et al. Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
US20230286158A1 (en) Autonomous sense and guide machine learning system
Yan et al. Real-world learning control for autonomous exploration of a biomimetic robotic shark
Wang et al. Obstacle avoidance for environmentally-driven USVs based on deep reinforcement learning in large-scale uncertain environments
CN109916400B (en) Unmanned ship obstacle avoidance method based on combination of gradient descent algorithm and VO method
Hadi et al. Adaptive formation motion planning and control of autonomous underwater vehicles using deep reinforcement learning
Wu et al. Multi-vessels collision avoidance strategy for autonomous surface vehicles based on genetic algorithm in congested port environment
Zhang et al. Intelligent vector field histogram based collision avoidance method for auv
CN117311160A (en) Automatic control system and control method based on artificial intelligence
CN111198568A (en) Underwater robot obstacle avoidance control method based on Q learning
Jose et al. Navigating the Ocean with DRL: Path following for marine vessels
CN114609925B (en) Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
Li et al. LSDA-APF: A Local Obstacle Avoidance Algorithm for Unmanned Surface Vehicles Based on 5G Communication Environment.
Ferrandino et al. A Comparison between Crisp and Fuzzy Logic in an Autonomous Driving System for Boats
Tang et al. Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning
US20220371709A1 (en) Path planning system and method for sea-aerial cooperative underwater target tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200526

RJ01 Rejection of invention patent application after publication