CN108820157B - Intelligent ship collision avoidance method based on reinforcement learning - Google Patents
Intelligent ship collision avoidance method based on reinforcement learning Download PDFInfo
- Publication number
- CN108820157B CN108820157B CN201810378954.4A CN201810378954A CN108820157B CN 108820157 B CN108820157 B CN 108820157B CN 201810378954 A CN201810378954 A CN 201810378954A CN 108820157 B CN108820157 B CN 108820157B
- Authority
- CN
- China
- Prior art keywords
- ship
- collision avoidance
- value
- strategy
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B63—SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
- B63B—SHIPS OR OTHER WATERBORNE VESSELS; EQUIPMENT FOR SHIPPING
- B63B43/00—Improving safety of vessels, e.g. damage control, not otherwise provided for
- B63B43/18—Improving safety of vessels, e.g. damage control, not otherwise provided for preventing collision or grounding; reducing collision damage
Landscapes
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Combustion & Propulsion (AREA)
- Mechanical Engineering (AREA)
- Ocean & Marine Engineering (AREA)
- Traffic Control Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses an intelligent ship collision avoidance method based on reinforcement learning, which comprises the steps of firstly, acquiring static data and dynamic data of two ships; then, checking the validity of the data, and judging whether a collision avoidance program needs to be started or not; calculating related collision avoidance parameters and judging whether dangerous conditions can be generated or not; if the collision danger cannot be generated, the vehicle can move forward according to the collision prevention rule by keeping the speed and the direction; if collision danger is generated, learning a collision avoidance strategy by using a reinforcement learning method, inputting data as calculated parameters for training, outputting the calculated parameters as a strategy generated after training, and acquiring a rudder angle required to be rotated by the ship; then executing a strategy, dynamically updating the dynamic data of the two ships in the step 1, and returning a reward value; after strategy execution is finished, determining a re-navigation opportunity according to a collision avoidance rule and then re-navigating. The invention realizes the autonomous learning and improvement of ship collision avoidance and avoids the unfavorable situation caused by the fact that seaman and the like depend on experience.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to an intelligent ship collision prevention method, and particularly relates to an intelligent ship collision prevention method based on reinforcement learning.
Background
In the navigation process, ship collision avoidance is a non-negligible problem, the problem has a plurality of different solutions, an AIS-based intelligent ship collision avoidance decision is utilized, an intelligent algorithm is utilized to carry out ship collision avoidance based on an evolutionary genetic algorithm, a Bayesian network-based ship collision avoidance algorithm and the like are utilized, the algorithms have certain capacity of solving the ship collision avoidance problem, but the algorithms also have limitations, and the algorithms cannot self-learn and improve collision avoidance strategies.
At present, ships avoid problems in open water areas among multiple ships, the existing ship collision avoidance mode in the open water areas is mainly based on international maritime collision avoidance rules, at present, due to the fact that relevant avoidance terms of the collision avoidance rules are mostly qualitative descriptions, and in the actual ship avoidance process, the common method of sea keepers, the actual ship operation experience of drivers and the like can obviously influence specific decision schemes and collision avoidance effects of the ships.
In practical situations, the collision avoidance of the ship is mainly controlled by people and depends on the general practice of sea operators and the actual ship-handling experience of drivers, so that the ship has a lot of instability.
Disclosure of Invention
In order to solve the technical problems, the invention adopts reinforcement learning to realize the optimization of the collision avoidance strategy and algorithm, provides an intelligent ship collision avoidance method based on reinforcement learning, realizes the autonomous learning and improvement of ship collision avoidance, and avoids the unfavorable situation caused by the fact that seaman and the like depend on experience.
The technical scheme adopted by the invention for solving the technical problems is as follows: an intelligent ship collision avoidance method based on reinforcement learning is characterized by comprising the following steps:
step 1: acquiring static data and dynamic data of two ships;
step 2: checking the legality of the data, calculating relevant collision avoidance parameters, judging whether dangerous conditions can be generated or not, and starting a collision avoidance program;
and 3, step 3: if the collision danger cannot be generated, the vehicle can move forward according to the collision prevention rule by keeping the speed and the direction; if collision danger is generated, learning a collision avoidance strategy by using a reinforcement learning method, inputting data as calculated parameters for training, outputting the calculated parameters as a strategy generated after training, and acquiring a rudder angle required to be rotated by the ship;
and 4, step 4: executing the strategy generated in the step 3, then dynamically updating the dynamic data of the two ships in the step 1, and returning a reward value; the reward value is used for evaluating the quality of the collision avoidance strategy;
and 5, step 5: after strategy execution is finished, determining a re-navigation opportunity according to a collision avoidance rule and then re-navigating.
The method has the advantages that the strategy optimization is carried out by adopting reinforcement learning, the error operation caused by intuition and experience is effectively reduced by auxiliary operators, the collision prevention efficiency of the ship is effectively improved, and the method of machine learning is used. After the strategy is optimized, the optimal strategy learned by the machine can be conveniently provided for operators to refer, and high-quality decisions are made to avoid more urgent situations.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
In the field of machine learning, reinforcement learning is used as an artificial intelligence method, a research team represented by a Deep Mind team firstly provides a DQN (Deep Q-Network) based Deep reinforcement learning method, and an Atari2600 part game is used as a test object, so that the result can exceed that of a human player, and the effect is obvious. In 2012, Lange further started to make applications, proposing Deep fixed Q learning for vehicle control. Experiments show that the method is suitable for the fields of intelligent control, robots, analysis, prediction and the like, and provides new ideas and opportunities for ship collision avoidance optimization operation. The invention can well fit the actions of human sailors, and the intelligent ship collision avoidance decision has the characteristics of autonomous learning and improvement.
Referring to fig. 1, the intelligent ship collision avoidance method based on reinforcement learning provided by the invention comprises the following steps:
step 1: acquiring static data and dynamic data of two ships;
the static data and the dynamic data of the two ships comprise ship information and target ship information; the ship information comprises ship state, ship gyration index, ship tracking index, track direction, ship heading, ground speed, water speed, longitude, latitude, rudder angle and draught; the target vessel information includes the name of the vessel, the MMSI, the call sign, the type of vessel, the length of the vessel, the width of the vessel, the track direction, the heading direction of the vessel, the speed of the ground, the speed of the water, the longitude, the latitude, the distance, the true azimuth, and the relative azimuth.
Step 2: checking the legality of the data, calculating relevant collision avoidance parameters, judging whether dangerous conditions can be generated or not, and starting a collision avoidance program;
the relevant collision avoidance parameters include Time To Close Point of Arrival (TCPA), Distance of Closest arrival (DCPA), safe Distance of safe arrival (SDA), urgent Distance of urgent situation (CQS), urgent Distance of Danger (IMD), Relative motion speed (VR) and Relative motion direction (AR);
a determination is made as to whether a hazardous condition is present, and when TCPA >0 and DCPA < SDA, a collision hazard is present.
And 3, step 3: if the collision danger cannot be generated, the vehicle can move forward according to the collision prevention rule by keeping the speed and the direction; if collision danger is generated, learning a collision avoidance strategy by using a reinforcement learning method, inputting data as calculated parameters for training, outputting the calculated parameters as a strategy generated after training, and acquiring a rudder angle required to be rotated by the ship;
the method for learning the collision avoidance strategy by applying the reinforcement learning method comprises the following specific steps:
step 4.1: inputting static parameters and dynamic parameters of a ship for training;
step 4.2: inputting various parameters into a Deep Q-learning Network (DQN) to train data; continuously updating the Q value function until the Q function is converged to obtain the best model;
step 4.3: inputting the static parameters and the dynamic parameters of the ship for testing into the trained model;
step 4.4: outputting a rudder angle required to be rotated by the ship;
in the embodiment, static data, dynamic data and marine environment data of two ships are obtained through various sensors and other devices. At the moment, a Markov decision process four-tuple E is generated<S,A,P,R>S is a state set describing the course and the navigation speed of the ship, A is an action set describing the rudder angle which the ship should turn, state transition probabilities are specified for the transition functions;for the reward function, a reward is specified. Existing algorithms typically employ DQN (Deep Q-learning Network) to train the data. First, Q-Table is initialized, the rows and columns are S and A, respectively, and the value of Q-Table is used to measure the quality of the action a taken by the current state S. This embodiment uses the Bellman equation to update the Q-Table during the training process:
Q(s,a)=r+γ(max(Q(s′,a′))
q (s, a) is expressed as r, immediately after the current s takes a, plus the maximum rewardmax (Q (s ', a')) after a discount γ.
In the embodiment, Q-Table is realized through a neural network in DQN, and Q values of different actions a are output by inputting a state x. The corresponding algorithm is as follows
1. A deep neural network is used as a network of Q values, and the parameter is omega;
Q(s,a,ω)≈Qπ(s,a)
2. defining objective function, namely loss function, in the Q value by using mean-square error;
L(ω)=E[(r+γ·maxa,Q(s,,a,,ω)-Q(s,a,ω)2)]
the above formula is s ', a', the next state and action, which is expressed by David Silver, and appears to be clear. It can be seen that the Q value to be updated by Q-Learning is used as the target value. With the target value and the current value, the deviation can be calculated by means of the mean square error.
3. Calculating the gradient of the parameter omega relative to the loss function;
4. using the SGD to realize the optimization target of the End-to-End;
the above gradient is calculated, andcalculations are performed from the deep neural network, so that parameters can be updated using SGD stochastic gradient descent to obtain an optimal Q value.
5. Randomly selecting action a with probability epsilontOr selecting the action a with the maximum Q value according to the Q value output by the networktThen get execution atRear prize rtAnd the input of the next network, the network calculates the output of the network at the next moment according to the current value, and the process is circulated.
After a plurality of iterations and training, when the Q value is converged to the maximum value, a good model is trained. The trained model is applied to collision avoidance of two ships, and can predict an optimal collision avoidance strategy, namely a turned rudder angle, under the current emergency condition, assist an operator to control the ships and change the ship state until collision avoidance is finished.
And 4, step 4: executing the strategy generated in the step 3, then dynamically updating the dynamic data of the two ships in the step 1, and returning a reward value; the reward value is used for evaluating the quality of the collision avoidance strategy;
the reward value comprises minimum flight path offset, shortest avoidance time, shortest avoidance path, shortest avoidance amplitude and minimum avoidance amplitude; the quality of the strategy depends on accumulated reward obtained after the strategy is executed for a long time, and the strategy can be continuously optimized when the Q value representing the reward is converged to the maximum value after a plurality of iterations and training are carried out in the training process.
And 5, step 5: after strategy execution is finished, determining a re-navigation opportunity according to a collision avoidance rule and then re-navigating.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (3)
1. An intelligent ship collision avoidance method based on reinforcement learning is characterized by comprising the following steps:
step 1: acquiring static data and dynamic data of two ships;
step 2: checking the legality of the data, calculating relevant collision avoidance parameters, judging whether dangerous conditions can be generated or not, and starting a collision avoidance program;
and 3, step 3: if the collision danger cannot be generated, the vehicle can move forward according to the collision prevention rule by keeping the speed and the direction; if collision danger is generated, learning a collision avoidance strategy by using a reinforcement learning method, inputting data as calculated parameters for training, outputting the calculated parameters as a strategy generated after training, and acquiring a rudder angle required to be rotated by the ship;
in the step 3, a reinforcement learning method is used for learning the collision avoidance strategy, and the specific implementation comprises the following substeps:
step 3.1: inputting static parameters and dynamic parameters of a ship for training;
step 3.2: inputting various parameters into the reinforcement learning DQN to train data; continuously updating the Q value function until the Q function is converged to obtain the best model;
firstly, static data, dynamic data and marine environment data of two ships are obtained, and a Markov decision process four-tuple E is generated<S,A,P,R>Wherein S is a state set describing the course speed of the ship, A is an action set describing the rudder angle the ship should turn, and P:state transition probabilities are specified for the transition functions; r:for the reward function, specifyRewarding;
training data using DQN; firstly, initializing Q-Table, wherein the rows and the columns are respectively S and A, and the value of the Q-Table is used for measuring the quality of an action a taken by the current state S; the Q-Table is updated during the training process using the Bellman equation:
Q(s,a)=r+γ(max(Q(s′,a′))
wherein Q (s, a) is represented as the instant reward value r of the current action after the current s takes a, plus the maximum rewardmax (Q (s ', a')) after the discount gamma;
Q-Table is realized in DQN through a neural network, and Q values of different actions a are output by inputting a state x; the specific implementation process is as follows:
(1) a deep neural network is adopted as a network of a Q value, and the parameter is omega;
Q(s,a,ω)≈Qπ(s,a);
(2) defining objective function, namely loss function, in the Q value by using mean-square error;
L(ω)=E[(r+γ·max a′Q(s′,a′,ω)-Q(s,a,ω)2)]
wherein s ', a' is the next state and action, represented by David Silver, using the Q value to be updated by Q-Learning as the target value;
(3) calculating the gradient of the parameter omega relative to the loss function;
(4) using the SGD to realize the optimization target of the End-to-End;
the above gradient is calculated, andcalculating from a deep neural network, and updating parameters by using SGD random gradient descent to obtain an optimal Q value;
(5) randomly selecting action a with probability epsilontOr selecting the action a with the maximum Q value according to the Q value output by the networktThen get execution atLater awardExcitertAnd the input of the next network, the network calculates the output of the network at the next moment according to the current value, and the process is circulated;
step 3.3: inputting the static parameters and the dynamic parameters of the ship for testing into the trained model;
step 3.4: outputting a rudder angle required to be rotated by the ship;
and 4, step 4: executing the strategy generated in the step 3, then dynamically updating the dynamic data of the two ships in the step 1, and returning a reward value; the reward value is used for evaluating the quality of the collision avoidance strategy;
the reward value comprises a minimum track offset, a shortest avoidance time, a shortest avoidance path and a minimum avoidance amplitude; the quality of the strategy depends on accumulated reward obtained after the strategy is executed for a long time, and the strategy can be continuously optimized when the Q value representing the reward is converged to the maximum value after a plurality of iterations and trainings are carried out in the training process;
and 5, step 5: after strategy execution is finished, determining a re-navigation opportunity according to a collision avoidance rule and then re-navigating.
2. The intelligent ship collision avoidance method based on reinforcement learning of claim 1, wherein: in step 1, the static data and the dynamic data of the two ships comprise ship information and target ship information; the ship information comprises ship state, ship gyration index, ship tracking index, track direction, ship heading, ground speed, water speed, longitude, latitude, rudder angle and draught; the target ship information comprises a ship name, MMSI, a call sign, a ship type, a ship length, a ship width, a track direction, a ship heading direction, a ground speed, a water speed, a longitude, a latitude, a distance, a true azimuth and a relative azimuth.
3. The intelligent ship collision avoidance method based on reinforcement learning of claim 1, wherein: in step 2, the relevant collision avoidance parameters comprise a latest meeting time TCPA, a latest meeting distance DCPA, a safe meeting distance SDA, an urgent situation distance CQS, an urgent danger distance IMD, a relative movement speed VR and a relative movement direction AR;
the determination of whether a dangerous condition is created, when TCPA >0 and DCPA < SDA, a collision danger is created.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810378954.4A CN108820157B (en) | 2018-04-25 | 2018-04-25 | Intelligent ship collision avoidance method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810378954.4A CN108820157B (en) | 2018-04-25 | 2018-04-25 | Intelligent ship collision avoidance method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108820157A CN108820157A (en) | 2018-11-16 |
CN108820157B true CN108820157B (en) | 2020-03-10 |
Family
ID=64155045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810378954.4A Active CN108820157B (en) | 2018-04-25 | 2018-04-25 | Intelligent ship collision avoidance method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108820157B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110400491A (en) * | 2019-06-10 | 2019-11-01 | 北京海兰信数据科技股份有限公司 | A kind of Open sea area multiple target auxiliary Decision of Collision Avoidance method and decision system |
CN110203325A (en) * | 2019-06-14 | 2019-09-06 | 上海外高桥造船有限公司 | The test method and system of the collision prevention function of ship autonomous navigation system |
CN111045445B (en) * | 2019-10-23 | 2023-11-28 | 浩亚信息科技有限公司 | Intelligent collision avoidance method, equipment and medium for aircraft based on reinforcement learning |
CN110648556B (en) * | 2019-10-29 | 2020-09-25 | 青岛科技大学 | Ship collision avoidance method based on ship collision avoidance characteristic |
CN110658829B (en) * | 2019-10-30 | 2021-03-30 | 武汉理工大学 | Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning |
CN111199272B (en) * | 2019-12-30 | 2023-11-03 | 同济大学 | Self-adaptive scheduling method for intelligent workshops |
CN111186549B (en) * | 2020-01-15 | 2021-08-17 | 大连海事大学 | Course autopilot control system with ship collision avoidance function |
CN111445086B (en) * | 2020-04-17 | 2022-07-26 | 集美大学 | Method for predicting fly-back time based on PIDVCA |
CN113799949B (en) * | 2020-06-11 | 2022-07-26 | 中国科学院沈阳自动化研究所 | AUV buoyancy adjusting method based on Q learning |
CN112149237A (en) * | 2020-10-15 | 2020-12-29 | 北京海兰信数据科技股份有限公司 | Real-time ship collision avoidance method and system |
CN112180950B (en) * | 2020-11-05 | 2022-07-08 | 武汉理工大学 | Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning |
CN112650246B (en) * | 2020-12-23 | 2022-12-09 | 武汉理工大学 | Ship autonomous navigation method and device |
CN113093721B (en) * | 2021-02-04 | 2022-03-01 | 中国船级社 | Automatic concurrent ship collision avoidance testing method and system |
CN113076338B (en) * | 2021-02-04 | 2023-02-28 | 中国船级社 | Rule-based intelligent ship collision avoidance automatic test scene generation method and system |
CN113012473B (en) * | 2021-02-07 | 2022-04-19 | 中电科(宁波)海洋电子研究院有限公司 | Low-cost wave glider marine navigation collision avoidance method |
CN113221449B (en) * | 2021-04-27 | 2024-03-15 | 中国科学院国家空间科学中心 | Ship track real-time prediction method and system based on optimal strategy learning |
CN115107948B (en) * | 2022-06-24 | 2023-08-25 | 大连海事大学 | Efficient reinforcement learning autonomous ship collision prevention method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11175493A (en) * | 1997-12-16 | 1999-07-02 | Nippon Telegr & Teleph Corp <Ntt> | Experience enhancement type enhanced learning method using behavior selection network and record medium recorded with experience enhancement type enhanced learning program |
JP2000080673A (en) * | 1998-09-08 | 2000-03-21 | Ishikawajima Harima Heavy Ind Co Ltd | Route planning method for dredger |
CN105139072A (en) * | 2015-09-09 | 2015-12-09 | 东华大学 | Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system |
CN106842925B (en) * | 2017-01-20 | 2019-10-11 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
-
2018
- 2018-04-25 CN CN201810378954.4A patent/CN108820157B/en active Active
Non-Patent Citations (2)
Title |
---|
《AUV自适应局部路径规划》;肖亚娟;《中国优秀硕士学位论文全文数据库》;20110515;第35-52页 * |
《船舶避碰仿真平台设计》;王凤军;《中国优秀硕士学位论文全文数据库》;20130915;第9-10页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108820157A (en) | 2018-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108820157B (en) | Intelligent ship collision avoidance method based on reinforcement learning | |
CN110333739B (en) | AUV (autonomous Underwater vehicle) behavior planning and action control method based on reinforcement learning | |
CN107102644B (en) | Underwater robot track control method and control system based on deep reinforcement learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
KR102228503B1 (en) | Environment navigation using reinforcement learning | |
CN113110509B (en) | Warehousing system multi-robot path planning method based on deep reinforcement learning | |
US11727281B2 (en) | Unsupervised control using learned rewards | |
CN109726866A (en) | Unmanned boat paths planning method based on Q learning neural network | |
CN113221449B (en) | Ship track real-time prediction method and system based on optimal strategy learning | |
CN112180950B (en) | Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning | |
CN107065890A (en) | A kind of unmanned vehicle intelligent barrier avoiding method and system | |
CN111273670B (en) | Unmanned ship collision prevention method for fast moving obstacle | |
CN113010963B (en) | Variable-quality underwater vehicle obstacle avoidance method and system based on deep reinforcement learning | |
CN112286218B (en) | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient | |
CN108711312A (en) | Ship based on BP neural network and static object mark risk of collision pre-judging method | |
CN113534668B (en) | Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework | |
US11887485B2 (en) | Control method and system for collaborative interception by multiple unmanned surface vessels | |
CN110716574B (en) | UUV real-time collision avoidance planning method based on deep Q network | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN114199248B (en) | AUV co-location method for optimizing ANFIS based on mixed element heuristic algorithm | |
CN113033118B (en) | Autonomous floating control method of underwater vehicle based on demonstration data reinforcement learning technology | |
US11347221B2 (en) | Artificial neural networks having competitive reward modulated spike time dependent plasticity and methods of training the same | |
CN114859910A (en) | Unmanned ship path following system and method based on deep reinforcement learning | |
Bremnes et al. | Intelligent risk-based under-ice altitude control for autonomous underwater vehicles | |
CN110779526B (en) | Path planning method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |