CN111880535A - Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning - Google Patents

Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning Download PDF

Info

Publication number
CN111880535A
CN111880535A CN202010715076.8A CN202010715076A CN111880535A CN 111880535 A CN111880535 A CN 111880535A CN 202010715076 A CN202010715076 A CN 202010715076A CN 111880535 A CN111880535 A CN 111880535A
Authority
CN
China
Prior art keywords
unmanned ship
obstacle avoidance
reinforcement learning
reward
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010715076.8A
Other languages
Chinese (zh)
Other versions
CN111880535B (en
Inventor
张卫东
王雪纯
徐鑫莉
蔡云泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010715076.8A priority Critical patent/CN111880535B/en
Publication of CN111880535A publication Critical patent/CN111880535A/en
Application granted granted Critical
Publication of CN111880535B publication Critical patent/CN111880535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning, wherein the method comprises the following steps: 1) building a marine environment; 2) setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code; 3) setting reward target weight to obtain a comprehensive reward function; 4) establishing and training an evaluation network and a strategy network; 5) and respectively inputting the reinforcement learning state codes into the evaluation network and the strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network. Compared with the prior art, the invention has high self-learning ability, can adapt to different large-scale complex environments through simple deployment training, and further realizes autonomous perception, autonomous navigation and autonomous obstacle avoidance.

Description

Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
Technical Field
The invention relates to an unmanned ship autonomous obstacle avoidance method and system, in particular to an unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning.
Background
The unmanned ship is an unmanned water vehicle capable of realizing autonomous navigation, autonomous obstacle avoidance and autonomous water surface operation, and has the advantages of small volume, high speed, good stealth, no casualty risk and the like. The unmanned ship is very suitable for executing water surface operation tasks in dangerous sea areas with greater risks to casualties of people or simple water surface operation tasks with low requirements on personnel participation degree, has good cost-effectiveness ratio, and is widely and effectively applied to the fields of ocean monitoring, ocean investigation, maritime search and rescue, unmanned freight transportation and the like.
At present, the mainstream thought for realizing autonomous navigation of the unmanned ship is to deploy and apply autonomous sensing, autonomous navigation and autonomous obstacle avoidance algorithms respectively, and each algorithm is matched with each other to complement and complete navigation and operation tasks. For example, algorithms such as pattern recognition, target detection and the like are involved in vision system perception, main ideas for realizing global planning autonomous navigation include a grid graph method, an A-algorithm, a genetic algorithm and the like, and methods such as an artificial potential field method, optimal interaction collision avoidance and the like are mainly applied to local dynamic collision avoidance. Although the methods have good performance in respective application backgrounds, different functional modules need to be elaborately designed, and parameters need to be integrally configured and adjusted for a comprehensive algorithm, so that the unmanned ship intelligent algorithm is complex and tedious to realize. Furthermore, because these methods lack the ability of autonomous learning, it is difficult to adapt to large-scale complex environments, and different algorithm modules need to be redesigned and recombined to cooperate with each other according to different environments.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a reinforcement learning-based unmanned ship hybrid perception autonomous obstacle avoidance method and system with autonomous learning and environmental characteristic adaptation capabilities.
The purpose of the invention can be realized by the following technical scheme:
an unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning comprises the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: and respectively inputting the reinforcement learning state codes into the evaluation network and the strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network.
Preferably, the interaction rule between the unmanned ship and the marine environment in the step 1) follows the own kinetic equation of the unmanned ship.
Preferably, the random obstacles generated in step 1) include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
Preferably, the motion space in step 2) includes discretized yaw force, pitch force and yaw.
Preferably, the strong learning state code in step 2) is obtained through deep network learning, and specifically includes:
and learning the characteristics of the static sea chart by combining the convolution neural network and full connection to obtain a static programming state code, taking the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as the key characteristics of the reinforcement learning state code, and redistributing the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
Figure BDA0002597880580000021
wherein,σtFor detecting the obstacle mark in the detection radius range,
Figure BDA0002597880580000022
The distance between the unmanned boat and the target in the world coordinate system,
Figure BDA0002597880580000023
the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe yaw speed of the coordinate system of the unmanned ship,
Figure BDA0002597880580000024
is the nearest obstacle distance in the world coordinate system,
Figure BDA0002597880580000025
the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
Preferably, the comprehensive reward function in step 3) is a product of a reward target weight matrix and a reward target, and the reward target includes: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
Preferably, the reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure BDA0002597880580000031
Then the distance to the reward target RdistanceNot all right 1, otherwise Rdistance=0,
Figure BDA0002597880580000032
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if
Figure BDA0002597880580000033
Obstacle avoidance reward target RobstanceNot all right 1, otherwise Robstance=0,
Figure BDA0002597880580000034
The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is not
Figure BDA0002597880580000035
Then the speed reward target RspeedNot all right 1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is not
Figure BDA0002597880580000036
Then the target R is rewarded for energy consumptionconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
Preferably, step 4) is done based on the A3C algorithm.
An unmanned ship hybrid perception autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the autonomous obstacle avoidance method when running the computer program.
Compared with the prior art, the invention has the following advantages:
the algorithm has high self-learning capacity, and can adapt to different large-scale complex environments through simple deployment training, so that autonomous perception, autonomous navigation and autonomous obstacle avoidance are realized;
the algorithm integrates the functions of environmental perception and navigation obstacle avoidance, and gets rid of the heavy burden of respective configuration and overall parameter adjustment caused by modular algorithm design;
the algorithm has the static planning and dynamic collision avoidance capabilities, on one hand, the track planning can be realized by learning a static sea chart, on the other hand, the algorithm can deal with sea surface real-time threats, and has reliable and stable threat avoidance capabilities.
Drawings
Fig. 1 is a schematic overall structure diagram of the unmanned ship hybrid sensing autonomous obstacle avoidance method based on reinforcement learning.
Fig. 2 is a schematic diagram of state coding of the unmanned ship hybrid perception reinforcement learning algorithm.
Fig. 3 is a parameter explanatory diagram of dynamic obstacle avoidance coding.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.
Examples
As shown in fig. 1, an unmanned surface vehicle hybrid perception autonomous obstacle avoidance method based on reinforcement learning includes the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
the interaction rule of the unmanned ship and the marine environment follows the self kinetic equation of the unmanned ship:
Figure BDA0002597880580000041
Figure BDA0002597880580000042
wherein eta is [ x, y, psi ═ x, y, psi]TContaining unmanned boat position and yaw angle information, v ═ u, upsilon, r]TIncluding yaw, pitch, yaw rate information, [ tau ═u,0,τt]TThe longitudinal and heading forces of the unmanned ship, M is the mass of the unmanned ship, and R (psi) is the yaw angleψ, C (v), g (v) are each a function of v;
the random obstacles generated include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
And 4 times of initial points and target points are randomly set for each generated marine environment, and the intelligent agent can interact for 500 times for the marine environments with different initial points and target points.
2) Setting an action space and a state space: setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
the motion space comprises discretized swaying force, discretized surging force and discretized yawing force;
the reinforcement learning state code is obtained through deep network learning, and specifically comprises the following steps:
and learning the characteristics of the static sea chart by combining the convolution neural network and full connection to obtain a static programming state code, taking the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as the key characteristics of the reinforcement learning state code, and redistributing the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
Preferably, the dynamic obstacle avoidance state code is:
Figure BDA0002597880580000043
wherein σtFor detecting the obstacle mark in the detection radius range,
Figure BDA0002597880580000044
The distance between the unmanned boat and the target in the world coordinate system,
Figure BDA0002597880580000045
the angle of the unmanned ship in the world coordinate system is from the target, and psi is worldYaw angle u of unmanned ship in boundary coordinate systemtIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, r, of the coordinate system of the unmanned shiptThe yaw speed of the coordinate system of the unmanned ship,
Figure BDA0002597880580000051
is the nearest obstacle distance in the world coordinate system,
Figure BDA0002597880580000052
the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
The action space of the under-actuated unmanned ship is discretized output of the surging force and the yawing force, and each propulsion force is discretized into 20 levels according to the thrust level. Referring to fig. 2, the state code learning process of reinforcement learning, static programming state codes, i.e., sea chart features, are obtained through combined network learning of CNN and FC, and are finally compressed into 256-dimensional vectors. The diagram of the nine-tuple in the dynamic obstacle avoidance state coding information is shown in fig. 3. The reinforcement learning state code is a 265-dimensional vector of the combination of the two codes, and is obtained by multiplying the two state codes by a learned weight matrix.
3) Determining a reward function: setting reward target weight to obtain a comprehensive reward function;
the composite reward function is the product of a reward target weight matrix and reward targets, and the reward targets comprise: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
The reward objectives are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure BDA0002597880580000053
Then the distance to the reward target RdistanceNot all right 1, otherwise Rdistance=0,
Figure BDA0002597880580000054
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if
Figure BDA0002597880580000055
Obstacle avoidance reward target RobstanceNot all right 1, otherwise Robstance=0,
Figure BDA0002597880580000056
The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is not
Figure BDA0002597880580000057
Then the speed reward target RspeedNot all right 1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is not
Figure BDA0002597880580000058
Then the target R is rewarded for energy consumptionconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
4) Establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are completed based on an A3C algorithm, the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, network parameters are initialized and trained, and in the network training process, the gradient calculation of the evaluation network meets the following updating method:
Figure BDA0002597880580000059
the gradient calculation of the policy network satisfies the following updating method:
Figure BDA00025978805800000510
wherein w is a network parameter of the policy network, θ is a network parameter of the evaluation network, stCoding for the dynamic obstacle avoidance state of the unmanned ship at the moment t, atIs the decision of the unmanned ship at the time t, pi (a)t|stω) is a policy network at stAction of output in the state rtMaking a for unmanned boattReward value given by the environment after decision, V(s)tTheta) is in the state stThe value of the network prediction is evaluated.
Two methods for obtaining mixed perception state coding are obtained while parameters are updated through network learning to obtain V(s) and pi (a | s).
5) And the intelligent agent decision controller outputs: and respectively inputting the reinforcement learning state codes into the evaluation network and the strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network.
In the embodiment, during training, the output of the controller, i.e. action selection, is obtained by sampling according to the learned mean-variance strategy distribution. And when the unmanned ship collides, ending the training of the current round in advance, if the current target point and the initial point complete 500 training rounds, returning to the step 1, regenerating the target point and the initial point, and if the current environment has been set with 4 initial points and target points, regenerating the marine environment.
And (4) regenerating the marine environment, the initial point and the target point under the actual test environment, carrying out interactive observation global planning and local obstacle avoidance information on the unmanned ship and the marine environment, obtaining a reinforcement learning state code through the network trained in the step (4), and executing the action corresponding to the strategy distribution mean value under the state code, namely controller output, so as to complete the set marine operation task.
An unmanned ship hybrid perception autonomous obstacle avoidance system based on reinforcement learning comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the autonomous obstacle avoidance method when running the computer program.
The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.

Claims (10)

1. An unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning is characterized by comprising the following steps:
1) building a marine environment: establishing an interaction rule between the unmanned ship and the marine environment, generating random obstacles, and randomly generating an initial point and a final point of the unmanned ship;
2) setting an action space and a state space: setting an action space according to the situation of the propeller of the unmanned ship, and learning according to global planning information provided by the static chart and obstacle information in the detection radius range of the radar system to obtain a reinforcement learning state code;
3) determining a reward function: setting reward target weight to obtain a comprehensive reward function;
4) establishing and training an evaluation network and a strategy network: the evaluation network and the strategy network are respectively formed by connecting a state coding network and a perceptron, and network parameters are initialized and trained;
5) and the intelligent agent decision controller outputs: and respectively inputting the reinforcement learning state codes into the evaluation network and the strategy network, inputting the comprehensive reward function into the evaluation network, and determining the output of the controller according to the action corresponding to the mean value of the learned strategy network.
2. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 1 is characterized in that in step 1), the unmanned ship and marine environment interaction rules follow the own kinetic equation of the unmanned ship.
3. The unmanned ship mixed perception autonomous obstacle avoidance method based on reinforcement learning of claim 1, wherein the random obstacles generated in step 1) include 4 kinds: random static obstacles that can be delineated by a chart, random dynamic obstacles that cannot be delineated by a chart, random dynamic obstacles with autonomous control capability, and random dynamic obstacles without autonomous control capability.
4. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 1, wherein the motion space in step 2) comprises discretized swaying force, surging force and yawing.
5. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 1, wherein the strong learning state code in step 2) is obtained through deep network learning, and specifically comprises:
and learning the characteristics of the static sea chart by combining the convolution neural network and full connection to obtain a static programming state code, taking the static programming state code and the dynamic obstacle avoidance state code fed back by the radar system processing as the key characteristics of the reinforcement learning state code, and redistributing the importance by learning the whole weight matrix to obtain the final reinforcement learning state code.
6. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 5, wherein the dynamic obstacle avoidance state coding is as follows:
Figure FDA0002597880570000021
wherein σtFor detecting the obstacle mark in the detection radius range,
Figure FDA0002597880570000022
The distance between the unmanned boat and the target in the world coordinate system,
Figure FDA0002597880570000023
the angle of the unmanned ship in the world coordinate system from the target is psi, the yaw angle and u of the unmanned ship in the world coordinate system are utIs the surging speed, v, of the coordinate system of the unmanned shiptIs an unmanned boatCoordinate system yaw rate, rtThe yaw speed of the coordinate system of the unmanned ship,
Figure FDA0002597880570000024
is the nearest obstacle distance in the world coordinate system,
Figure FDA0002597880570000025
the subscript t denotes the time t, which is the nearest obstacle angle in the world coordinate system.
7. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 1, wherein the comprehensive reward function in the step 3) is a product of a reward target weight matrix and a reward target, and the reward target comprises: a distance reward objective, an obstacle avoidance reward objective, a speed reward objective, and an energy consumption reward objective.
8. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 7, wherein the reward targets are obtained by:
in the task of navigating the unmanned ship to the target point, if
Figure FDA0002597880570000026
Then the distance to the reward target RdistanceNot all right 1, otherwise Rdistance=0,
Figure FDA0002597880570000027
The distance between the unmanned ship and the target in the world coordinate system is shown, subscript t represents the time t, and subscript t +1 represents the time t + 1;
when the radar detects an obstacle and is within the range threatened by the obstacle, if
Figure FDA0002597880570000028
Obstacle avoidance reward target RobstanceNot all right 1, otherwise Robstance=0,
Figure FDA0002597880570000029
The subscript t represents the time t, and the subscript t +1 represents the time t + 1;
if it is not
Figure FDA00025978805700000210
Then the speed reward target RspeedNot all right 1, otherwise Rspeed=0,utIs the surging speed, v, of the coordinate system of the unmanned shiptIs the swaying speed, v, of the coordinate system of the unmanned shipthSetting a speed threshold;
if it is not
Figure FDA00025978805700000211
Then the target R is rewarded for energy consumptionconsumptionNot all right 1, otherwise Rconsumption=0,τuIs the surging force, tau, of the unmanned boatrIs the bow shaking force, tau, of the unmanned boatthA threshold is set for energy consumption.
9. The unmanned ship hybrid perception autonomous obstacle avoidance method based on reinforcement learning of claim 1, wherein step 4) is performed based on an A3C algorithm.
10. An unmanned ship hybrid perception autonomous obstacle avoidance system based on reinforcement learning, comprising a memory, a processor and a computer program which is stored on the memory and can be run on the processor, wherein when the computer program is run by the processor, the autonomous obstacle avoidance method according to any one of claims 1 to 9 is realized.
CN202010715076.8A 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning Active CN111880535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010715076.8A CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010715076.8A CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111880535A true CN111880535A (en) 2020-11-03
CN111880535B CN111880535B (en) 2022-07-15

Family

ID=73155952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010715076.8A Active CN111880535B (en) 2020-07-23 2020-07-23 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN111880535B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540614A (en) * 2020-11-26 2021-03-23 江苏科技大学 Unmanned ship track control method based on deep reinforcement learning
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112925319A (en) * 2021-01-25 2021-06-08 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN113176776A (en) * 2021-03-03 2021-07-27 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN114077258A (en) * 2021-11-22 2022-02-22 江苏科技大学 Unmanned ship pose control method based on reinforcement learning PPO2 algorithm
CN114721409A (en) * 2022-06-08 2022-07-08 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN114942643A (en) * 2022-06-17 2022-08-26 华中科技大学 Construction method and application of USV unmanned ship path planning model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319276A (en) * 2017-12-26 2018-07-24 上海交通大学 Underwater robot attitude regulation control device and method based on Boolean network
CN108489491A (en) * 2018-02-09 2018-09-04 上海交通大学 A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110775200A (en) * 2019-10-23 2020-02-11 上海交通大学 AUV quick laying and recovering device under high sea condition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319276A (en) * 2017-12-26 2018-07-24 上海交通大学 Underwater robot attitude regulation control device and method based on Boolean network
CN108489491A (en) * 2018-02-09 2018-09-04 上海交通大学 A kind of Three-dimensional Track Intelligent planning method of autonomous underwater vehicle
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment
CN110775200A (en) * 2019-10-23 2020-02-11 上海交通大学 AUV quick laying and recovering device under high sea condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王程博 等: "基于Q-Learning的无人驾驶船舶路径规划", 《船海工程》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540614A (en) * 2020-11-26 2021-03-23 江苏科技大学 Unmanned ship track control method based on deep reinforcement learning
CN112540614B (en) * 2020-11-26 2022-10-25 江苏科技大学 Unmanned ship track control method based on deep reinforcement learning
CN112698646B (en) * 2020-12-05 2022-09-13 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112698646A (en) * 2020-12-05 2021-04-23 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112925319A (en) * 2021-01-25 2021-06-08 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN112925319B (en) * 2021-01-25 2022-06-07 哈尔滨工程大学 Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN113176776A (en) * 2021-03-03 2021-07-27 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN113176776B (en) * 2021-03-03 2022-08-19 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN114077258A (en) * 2021-11-22 2022-02-22 江苏科技大学 Unmanned ship pose control method based on reinforcement learning PPO2 algorithm
CN114077258B (en) * 2021-11-22 2023-11-21 江苏科技大学 Unmanned ship pose control method based on reinforcement learning PPO2 algorithm
CN114721409A (en) * 2022-06-08 2022-07-08 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN114942643A (en) * 2022-06-17 2022-08-26 华中科技大学 Construction method and application of USV unmanned ship path planning model
CN114942643B (en) * 2022-06-17 2024-05-14 华中科技大学 Construction method and application of USV unmanned ship path planning model

Also Published As

Publication number Publication date
CN111880535B (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN111880535B (en) Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning
Zhou et al. The review unmanned surface vehicle path planning: Based on multi-modality constraint
Statheros et al. Autonomous ship collision avoidance navigation concepts, technologies and techniques
Perera et al. Experimental evaluations on ship autonomous navigation and collision avoidance by intelligent guidance
Perera et al. Intelligent ocean navigation and fuzzy-Bayesian decision/action formulation
CN112034711B (en) Unmanned ship sea wave interference resistance control method based on deep reinforcement learning
Wang et al. Ship route planning based on double-cycling genetic algorithm considering ship maneuverability constraint
Zhang et al. An adaptive obstacle avoidance algorithm for unmanned surface vehicle in complicated marine environments
CN112925319B (en) Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
Wang et al. Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm
Han et al. A COLREGs-compliant guidance strategy for an underactuated unmanned surface vehicle combining potential field with grid map
Xia et al. Research on collision avoidance algorithm of unmanned surface vehicle based on deep reinforcement learning
Wang et al. Unmanned surface vessel obstacle avoidance with prior knowledge‐based reward shaping
Xu et al. Deep convolutional neural network based unmanned surface vehicle maneuvering
Sun et al. Collision avoidance control for unmanned surface vehicle with COLREGs compliance
Patil et al. Deep reinforcement learning for continuous docking control of autonomous underwater vehicles: a benchmarking study
Hamad et al. Path Planning of Mobile Robot Based on Modification of Vector Field Histogram using Neuro-Fuzzy Algorithm.
Hinostroza et al. Experimental and numerical simulations of zig-zag manoeuvres of a self-running ship model
Hayner et al. HALO: Hazard-aware landing optimization for autonomous systems
Dimitrov et al. Model identification of a small fully-actuated aquatic surface vehicle using a long short-term memory neural network
CN116774712A (en) Real-time dynamic obstacle avoidance method in underactuated AUV three-dimensional environment
Cheng et al. Trajectory optimization for ship navigation safety using genetic annealing algorithm
Wang et al. Deep Reinforcement Learning Based Tracking Control of an Autonomous Surface Vessel in Natural Waters
Yuan et al. EMPMR berthing scheme: A novel event-triggered motion planning and motion replanning scheme for unmanned surface vessels
Ma et al. Cooperative towing for double unmanned surface vehicles connected with a floating rope via vertical formation and adaptive moment control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant