CN111829527B - Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements - Google Patents

Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements Download PDF

Info

Publication number
CN111829527B
CN111829527B CN202010717418.XA CN202010717418A CN111829527B CN 111829527 B CN111829527 B CN 111829527B CN 202010717418 A CN202010717418 A CN 202010717418A CN 111829527 B CN111829527 B CN 111829527B
Authority
CN
China
Prior art keywords
unmanned ship
network
time
target
obstacle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010717418.XA
Other languages
Chinese (zh)
Other versions
CN111829527A (en
Inventor
曾喆
杜沛
刘善伟
万剑华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202010717418.XA priority Critical patent/CN111829527B/en
Publication of CN111829527A publication Critical patent/CN111829527A/en
Application granted granted Critical
Publication of CN111829527B publication Critical patent/CN111829527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/203Specially adapted for sailing ships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements, which comprises the following basic steps: s1 interpolating wind, wave and flow data of the target sea area, and adding barrier information, starting point information and end point information; s2, evaluating the maximum value of the unmanned ship subjected to the storm flow by using the Bayesian network; s3 reorganizes the target sea area AIS data into a training network to obtain an optimized experience pool and preliminary network parameters; s4, respectively inputting the unmanned ship state feature vectors to a deep reinforcement learning module for algorithm iteration, updating network parameters and outputting actions; s5, the unmanned ship is operated for 15S each time of iteration, and the data is updated when the time is accumulated to 1 h; and S6, when the unmanned ship reaches the target point, ending the iteration and outputting a path. The invention fully considers the influence of marine environmental factors on the navigation of the unmanned ship, better accords with the actual long-distance navigation condition of the unmanned ship, and can simultaneously consider the environmental factors and the barrier information under the severe sea condition of the unmanned ship to obtain high-quality safety.

Description

Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
Technical Field
The patent relates to the field of unmanned ship path planning, in particular to an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements.
Background
The unmanned ship breaks through in many technical fields depending on the development of an artificial intelligence control technology, gradually enters the visual field of people, starts to take on tasks such as ocean exploration and data acquisition, and gradually develops to the marine operation industry.
The presently published patents: CN109657863A, CN109726866A and CN107289939A all provide better path planning methods in the field, but only consider the influence of obstacles on unmanned ships in general. Each unmanned ship has a limit value for bearing wind waves according to factors such as own materials, structure, draught and the like, and when the unmanned ship is influenced by strong wind and strong waves in a real sea area, the unmanned ship has the danger of side turning or overturning, so that the unmanned ship sails in the sea area to avoid dangerous marine environment elements and areas with obstacles, which is extremely important for sailing safety and is particularly prominent for marine transport type unmanned ships.
The influence of marine environment elements on the navigation of the unmanned ship is considered, the marine environment elements and obstacle information around the unmanned ship are used as characteristic input vectors of deep reinforcement learning, the elements which have the largest influence on algorithm output at each moment are highlighted by using an attention moment array, and compared with a collision avoidance reinforcement learning method, the method has the advantages that the reward value is not a fixed value, and the influence degree of the marine environment elements and the obstacles on the unmanned ship is comprehensively changed. The method is more suitable for the actual situation of unmanned ship navigation, and can obtain a high-quality safe path by considering environmental factors and barrier information during navigation of the unmanned ship.
Disclosure of Invention
Objects of the invention
Aiming at the problem that marine environment elements are not considered in many unmanned ship path planning methods proposed at present, the invention provides the unmanned ship path planning method based on deep reinforcement learning and considering the marine environment elements, which fully considers the real marine environment elements and marine obstacles and combines the deep reinforcement learning method to plan a safe and efficient driving path for the unmanned ship.
(II) technical scheme
In order to achieve the purpose, the technical scheme of the invention is as follows: an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements comprises the following specific steps:
(1) interpolating the data of wind speed, flow speed and wave height at t moment of the target sea area into a grid of 200m multiplied by 200m by stTo describe the characteristic state vector of the unmanned ship at the moment t, namely:
Figure BDA0002598732400000011
wherein
Figure BDA0002598732400000012
Respectively representing the wind speed and wave height of the unmanned ship at the t momentAnd the flow rate of the liquid, and,
Figure BDA0002598732400000013
the distance between the time t and the obstacle of the unmanned ship
Figure BDA0002598732400000014
Indicating that no obstacle has been detected by the drone;
(2) the capability of the unmanned ship for resisting wind, wave and flow is evaluated by utilizing a Bayesian network, the material, the displacement, the length, the width and the height of the unmanned ship are input, and the output is
Figure BDA0002598732400000021
Three parameters respectively representing the maximum values of wind speed, wave height and flow speed borne by the unmanned ship and used for calculating a reward function
(3) Initializing a deep reinforcement learning model, specifically comprising: two identical LSTM networks (as target Q network and actual Q network, respectively), a reward function model, a model experience pool, and an action output set.
(4) Three attributes of coordinates, course and speed of real AIS data of a target sea area are reserved, three marine environment element values and obstacle information are superposed into the AIS data according to time and point positions, new AIS data are used as training samples and put into a deep reinforcement learning model for training, and an optimized experience pool and preliminary network parameters are obtained;
(5) setting a starting point coordinate and an end point coordinate of the voyage of the unmanned ship, and obtaining a state feature vector s of the unmanned ship at the time ttRespectively inputting the data into an actual Q network and a reward function model;
wherein: actual Q network calculation yields QPractice ofAnd find Q according to an e-greedy strategyPractice ofCorresponding actions are output; the reward function model calculates the reward value R of the current iterationt(ii) a Randomly extracting n records in an experience pool by a target Q network, and combining the n records with RtCalculating QTarget,QPractice ofAnd QTargetCalculating loss functions together, updating network parameters of the actual Q network by using a gradient descent method, and when the iteration times reach a certain threshold value alpha, updating the network parameters of the actual Q networkCopying all parameters to a target Q network;
(6) the motion time of the unmanned ship is 15 seconds each time, and when the accumulated motion time reaches 1h, the information of the wind speed, the ocean current, the wave height and the obstacles in the sea area is updated to the current time;
(7) and when the unmanned ship reaches the target point, finishing the iteration output of the safety path.
Specifically, the bayesian network construction method in the step (2) includes the following steps:
(2.1) the nodes of the unmanned ship evaluating bayesian network include: the material, the water displacement, the length, the width, the height, the wind resistance level, the wave resistance level and the flow resistance level are taken as bottom-layer nodes, the wind resistance level, the wave resistance level and the flow resistance level are high-grade nodes, and the bottom-layer nodes are fully connected with the high-grade nodes;
(2.2) training a Bayesian network by taking the unmanned ship structure data as a sample to obtain a conditional probability table of each node;
(2.3) inputting unmanned ship information to be evaluated, including: material, water displacement, length, width and height, calculating the probability of each grade of the three high-grade nodes according to the conditional probability table, and outputting the maximum probability grade as a final value;
(2.4) mapping the grades of the wind speed, wave height and flow velocity of the unmanned ship obtained through the Bayesian network corresponding to the sea state grade into specific numerical values as
Figure BDA0002598732400000022
The value of (c).
Specifically, the reward function model described in step (3), wherein the reward value RtThe calculation formula is as follows:
Rt=softmax(|(θsafe-st)·w1|)·((θsafe-st)·w2)T
wherein s istIs a characteristic state vector theta of the unmanned ship at the time tsafeIncluding four parameters for a safety threshold vector
Figure BDA0002598732400000023
Wherein
Figure BDA0002598732400000024
The maximum value of the wind speed, wave height and flow velocity borne by the unmanned ship obtained in the step 2,
Figure BDA0002598732400000025
sensing obstacle Range, w, for unmanned vessels1The attention matrix of the reward function is a 4 multiplied by 4 upper triangular constant square matrix, and the diagonal element W of the matrixii(i ═ 1,2,3,4) corresponding to the wind speed, wave height, ocean currents and the extent of the effect of obstacles on the path plan, respectively, the off-diagonal element WijRepresenting the correlation between the element i and the element j, wherein the matrix has the function of processing the element values of the marine environment with different orders of magnitude to the same order of magnitude for comparison, and can highlight key elements; w is a2Is a 4 × 4 diagonal array, requires the sum (theta)safe-st) Partially combined to give the final prize value RtEndowing positive and negative, and simultaneously enlarging the reward value to facilitate decision making;
softmax(|(θsafe-st)·w1|) partial calculation of the coefficient of the reward function, responsible for giving weight to each element value, the weight highlights the more important elements to the decision in each iteration, and the reward value can be rapidly reduced when encountering the elements with suddenly increased numerical values and the time of detecting the obstacle; when the unmanned ship does not sense the obstacle, the reward function can guide the model to avoid a high-storm area, and can make collision avoidance actions in the first time when the obstacle is sensed.
(III) advantageous effects
The advantages of the invention are embodied in that:
1. the wind speed, the wave height, the ocean current flow velocity and the obstacle information are jointly used as main reference objects for planning the unmanned ship path, the planned path is more feasible, and in the calculation process of the method, the data are updated according to the running time of the unmanned ship, so that the reliability of the path planning result is ensured.
2. The designed reward function can highlight the more important elements for decision in each iteration, simultaneously considers the detection capability and the capability of bearing storm impact of the ship, gives rewards in a safe area, gives proper penalty in a dangerous area, and makes an avoidance decision at the first time when an obstacle is detected, so that the path planning efficiency of the method is improved, and the path planning result is optimized.
3. The method for evaluating the wind and wave resistance of the unmanned ship by using the Bayesian network is provided, replaces the conventional evaluation mode of giving the wind and wave resistance grade by using expert experience, and is more scientific and efficient.
Drawings
FIG. 1 is a flow chart of a method for planning a unmanned ship route based on deep reinforcement learning and considering marine environment elements
FIG. 2 is a schematic diagram of a Bayesian network for evaluating the capability of an unmanned ship in resisting wind, wave and current
FIG. 3 is a flow chart of a deep reinforcement learning algorithm used by the model
FIG. 4 is a schematic diagram of path planning under the influence of elements and obstacles in marine environment
Detailed Description
The invention will now be described more fully and clearly with reference to the accompanying drawings and examples:
FIG. 1 is a flow chart of a method for planning a path of an unmanned ship based on deep reinforcement learning and considering elements of a marine environment, wherein the method gives a reasonable solution for safely completing a navigation task of the unmanned ship by fully considering the material and structure of the unmanned ship and possible strong wind, strong waves, ocean currents and obstacles in a sea area; the method mainly comprises two modules, wherein the first module is a Bayesian network evaluation module used for evaluating the wind wave resistance of the unmanned ship, and the second module is a deep reinforcement learning route planning module considering marine environment elements; the method utilizes a reward function of deep reinforcement learning to couple the two modules, so that the unmanned ship can make a proper risk avoiding decision according to the self material and structure. The unmanned ship planning method is suitable for planning the path of the unmanned ship for executing the long-range mission.
Specifically, the method comprises the following steps:
(1) the wind speed, the flow speed and the wave height data of the target sea area at the time t and the forecast wind speed and the forecast after the time t are obtainedJointly using a kriging interpolation method for predicting ocean current and wave height data, and interpolating into a grid of 200m multiplied by 200 m; the grid size of 200m × 200m is a value for which the unmanned ship acquires a new element after at most three operations. Storing data by using a three-dimensional array, wherein three dimensions of the array are longitude, latitude and time respectively, the time interval of the data is 1h, and s is used at the time of ttTo describe the characteristic state vector of the unmanned ship at the moment t, namely:
Figure BDA0002598732400000041
wherein
Figure BDA0002598732400000042
Respectively represents the wind speed, wave height and flow velocity of the unmanned ship at the time t,
Figure BDA0002598732400000043
the distance between the time t and the obstacle of the unmanned ship
Figure BDA0002598732400000044
Indicating that no obstacle has been detected by the drone;
(2) the capability of resisting wind, wave and flow of the unmanned ship is evaluated by using the Bayesian network, the material, the displacement, the length, the width and the height of the unmanned ship are input, and the output is
Figure BDA0002598732400000045
And three parameters respectively represent the maximum values of wind speed, wave height and flow speed which can be borne by the unmanned ship and are used for calculating the reward function. The specific steps for constructing the Bayesian network are as follows:
(2.1) as shown in fig. 2, a bayesian network schematic diagram for evaluating the capability of the unmanned ship for resisting wind, wave and current is provided, and nodes comprise: the material, the water displacement, the length, the width, the height, the wind speed resistance grade, the wave height resistance grade and the flow speed resistance grade are taken as bottom layer nodes; the wind resistance level, the wave resistance level and the flow speed resistance level are high-grade nodes, and the bottom-layer nodes are fully connected with the high-grade nodes;
(2.2) training a Bayesian network by taking the unmanned ship structure information as a sample, wherein data need to be subjected to discretization processing, and the unmanned ship structure table is shown as the following table:
Figure BDA0002598732400000046
TABLE 1
And putting the data into a Bayesian network for training to obtain a conditional probability table of each node.
(2.3) inputting unmanned ship information to be evaluated, including: material, water displacement, length, width and height; and calculating the probability of each grade of the three high-grade nodes according to the conditional probability table, and outputting the maximum probability grade as a final value.
(2.4) mapping the grades of the wind speed, wave height and flow velocity of the unmanned ship obtained through the Bayesian network to specific numerical values corresponding to the sea condition grade table to serve as
Figure BDA0002598732400000051
The value of (c).
(3) Initializing a deep reinforcement learning model, specifically comprising: two identical LSTM networks (respectively serving as a target Q network and an actual Q network), a reward function model, a model experience pool and an action output set;
in particular, a reward function model is described, wherein a reward value R istThe calculation formula is as follows:
Rt=softmax(|(θsafe-st)·w1|)·((θsafe-st)·w2)T
wherein s istIs a characteristic state vector theta of the unmanned ship at the time tsafeIncluding four parameters for a safety threshold vector
Figure BDA0002598732400000052
Wherein
Figure BDA0002598732400000053
For unmanned ship bearing obtained by Bayesian network evaluationThe maximum value of wind speed, wave height and flow speed,
Figure BDA0002598732400000054
the collision avoidance range is sensed for the unmanned ship, and the negative sign is added in the front for convenient calculation; weight matrix w1Is a 4 × 4 symmetric constant square matrix with diagonal element Wii(i ═ 1,2,3,4) corresponding to the wind speed, wave height, ocean currents and the extent of the effect of obstacles on the path plan, respectively, the off-diagonal element WijRepresenting the correlation between the element i and the element j, wherein the matrix is used for processing the marine environment element values with different orders of magnitude to the same order of magnitude for comparison, and can highlight the key element, particularly the w1The values are given empirically:
Figure BDA0002598732400000055
w2is a 4 x 4 diagonal matrix, needs and
Figure BDA0002598732400000056
partially combined to give the final prize value RtEndowing positive and negative, simultaneously enlarging the reward value and accelerating the decision-making speed; w is a2The method specifically comprises the following steps:
Figure BDA0002598732400000057
softmax(|(θsafe-st)·w1|) calculating the coefficients of the reward function, in charge of giving weight to each characteristic state element, the weight highlights the element which is more important to the decision in each iteration, and the reward value is rapidly reduced when encountering the element with suddenly increased numerical value and the moment of detecting the obstacle, (theta)safe-st)·w2And part, attaching positive and negative to the calculation result to indicate that reward or punishment is made. The calculation mode of the reward function is exemplified and divided into two cases of not meeting obstacles and meeting obstacles:
when no obstacle is encountered:
suppose a certain time tnCharacteristic state vector of
Figure BDA0002598732400000058
Comprises the following steps:
Figure BDA0002598732400000059
unmanned ship safety threshold vector thetasafe=[3,1.5,0.2,500]NAN represents not participating in the calculation, then softmax (| (θ)safe-st)·w1I) the calculation result is [0.867,0.117,0.016,0 |)]It means that in this calculation, the marine factor "wind speed" needs attention; (theta)safe-st)·w2The part and the weight value are subjected to multiplication, and positive and negative are attached to a calculation result to indicate that reward or punishment is given; the final calculation result of-19.95 represents that punishment is made.
When an obstacle is encountered:
suppose a certain time tnCharacteristic state vector of
Figure BDA0002598732400000061
Comprises the following steps:
Figure BDA0002598732400000062
indicating that the obstacle is detected 50m away from the unmanned ship, the unmanned ship just senses the obstacle, and the safety threshold vector of the unmanned ship
Figure BDA0002598732400000063
Softmax (| (θ)safe-st)·w1I) the calculation result is [0,0,0,1 |)]It means that in this calculation, avoiding obstacles is the most important; (theta)safe-st)·w2And performing dot multiplication on the part and the weight value, attaching positive and negative to a calculation result, and giving a penalty when the final calculation result is-200.
Through the calculation, the algorithm drives the unmanned ship through the reward function, focuses on marine environment elements when no obstacle is detected, and reacts at the first time when the obstacle is detected;
(4) the method comprises the following steps of reserving three attributes of coordinates and course of real AIS data of a target sea area, superposing three marine environment element values and obstacle information into the AIS data according to time and point positions, wherein a new AIS data sample is shown in the following table:
Figure BDA0002598732400000064
TABLE 2
Putting the newly-sorted AIS data serving as training samples into a deep reinforcement learning model for training to obtain an optimized experience pool and preliminary network parameters;
(5) and (5) selecting the discretization course angle of the unmanned ship as the action output of the deep reinforcement learning when the fixed unmanned ship running speed v is 10 m/s. Considering the steering capacity of the ship, the heading change range is limited to be between 35 degrees and minus 35 degrees and discretized at equal intervals, namely an action set output by the model:
A={35,25,15,5,-5,-15,-25,-35}
(6) referring to fig. 3, which is a flowchart of a deep reinforcement learning algorithm, two identical LSTM networks are used as an actual Q network and a target Q value network in a deep reinforcement learning framework, respectively; obtaining state characteristic vector s of unmanned ship at time ttRespectively inputting the data into an actual Q network and a reward function model; the LSTM input layer of the actual Q network at time t is the feature state vector stAnd the output Q(s) of the actual Q network at the last momentt-1)Practice ofThe output layer is Q(s)t)Practice ofValue of Q(s)t)Practice ofThen, selecting action a corresponding to the Q value by utilizing an epsilon-greedy strategyt(at∈A);
(7) Calculating the reward value R at time ttThe characteristic state vector s at the time ttAnd action atExecution of atThe latter feature state vector st' and the Boolean value isend, which determines whether the iteration has terminated, together as a record rect={st,at,Rt,st', is _ end } is stored in experience pool D;
(8) randomly extracting n records from experience pool D si,ai,Ri,si′,is_endiN calculating a target Q value Q, 1,2, …Target
Figure BDA0002598732400000071
Wherein R isiThe prize value recorded for the ith entry, γ is the discount factor, in this example γ is 0.9, ω is a parameter of the actual Q network, ω' is a parameter of the target Q network, amax(si', ω) is the action chosen to record the re-projection of i into the actual Q network:
Figure BDA0002598732400000072
wherein s isi′、aiAnd omega respectively recording the state characteristic vector, the action and the network parameter of the i;
(9) calculating the accumulated loss of the i records, and updating the parameter omega of the actual Q network by utilizing gradient descent, wherein the used loss function is as follows:
Figure BDA0002598732400000073
(10) when the iteration number of the actual Q network reaches the threshold value alpha, the parameter omega of the actual Q network is wholly copied to the target Q network.
(11) The motion time of the unmanned ship is 10 seconds each time, and when the accumulated motion time reaches 1h, the wind speed, ocean current, wave height and obstacle information data of the sea area are updated to the current time;
(12) and finishing the iteration when the unmanned ship reaches the termination point, and outputting a safe path.
Fig. 4 is a schematic diagram of path planning under the influence of marine environmental elements and obstacles, and the method can avoid high marine environmental risk areas and obstacles when planning a path.
The above is an example of the present invention, and all changes made according to the technical scheme of the present invention, which produce the functional effects, do not exceed the technical scheme of the present invention, and all belong to the protection scope of the present invention.

Claims (1)

1. An unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements is characterized by comprising the following steps:
(1) interpolating the wind speed, flow speed and wave height data of the target sea area at the time t into a grid of 500m multiplied by 500m by stTo describe the characteristic state vector of the unmanned ship at the moment t, namely:
Figure FDA0003086131900000011
wherein
Figure FDA0003086131900000012
Respectively represents the wind speed, wave height and flow velocity of the unmanned ship at the time t,
Figure FDA0003086131900000013
the distance between the time t and the obstacle of the unmanned ship
Figure FDA0003086131900000014
Indicating that no obstacle has been detected by the drone;
(2) the capability of the unmanned ship for resisting wind, wave and flow is evaluated by utilizing a Bayesian network, the material, the displacement, the length, the width and the height of the unmanned ship are input, and the output is
Figure FDA0003086131900000015
Three parameters respectively represent the maximum values of wind speed, wave height and flow speed which can be borne by the unmanned ship;
1) constructing a Bayesian network node, comprising: the material, the water displacement, the length, the width, the height, the wind resistance level, the wave resistance level and the flow resistance level are taken as bottom-layer nodes, the wind resistance level, the wave resistance level and the flow resistance level are high-grade nodes, and the bottom-layer nodes are fully connected with the high-grade nodes;
2) training a Bayesian network by taking the unmanned ship structure data as a sample to obtain a conditional probability table of each node;
3) inputting unmanned ship information to be evaluated, comprising: material, water displacement, length, width and height, calculating the probability of each level of the three high-level nodes according to the conditional probability table, and outputting the maximum probability level as a final value;
4) mapping the grades of the wind speed, wave height and flow velocity of the unmanned ship obtained through the Bayesian network corresponding to the sea condition grade into specific numerical values as
Figure FDA0003086131900000016
A value of (d);
(3) initializing a deep reinforcement learning model, specifically comprising: two identical LSTM networks as a target Q network and an actual Q network, a reward function model, a model experience pool, and an action output set, wherein the reward value RtThe calculation formula is as follows:
Figure FDA0003086131900000017
in the formula stIs a characteristic state vector theta of the unmanned ship at the time tsafeIncluding four parameters for a safety threshold vector
Figure FDA0003086131900000018
Wherein
Figure FDA0003086131900000019
The maximum value of the wind speed, wave height and flow velocity borne by the unmanned ship obtained in the step 2,
Figure FDA00030861319000000110
the collision avoidance range of the unmanned ship; weight matrix w4×4The attention matrix is a 4 x 4 square matrix of upper triangular constants, the diagonal element W of which is the reward functioniiCorresponding to wind speed, wave height, ocean current anddegree of influence of obstacle on path planning, i ═ 1,2,3,4, off-diagonal element WijRepresenting the correlation between element i and element j;
Figure FDA00030861319000000111
calculating coefficients of the reward function, giving weight to each characteristic state element, highlighting the more important elements for decision in each iteration, and enabling the reward value to rapidly decrease when encountering the elements with suddenly increased numerical values and the time of detecting the obstacle; when the unmanned ship does not sense the obstacle, the reward function can guide the model to avoid a high-storm area, and can perform collision avoidance action in the first time when the obstacle is sensed;
(4) three attributes of coordinates, course and speed of real AIS data of a target sea area are reserved, three marine environment element values and obstacle information are superposed into the AIS data according to time and point positions, new AIS data are used as training samples and put into a deep reinforcement learning model for training, and an optimized experience pool and preliminary network parameters are obtained;
(5) setting a starting point coordinate and an end point coordinate of the voyage of the unmanned ship, and obtaining a state feature vector s of the unmanned ship at the time ttRespectively inputting the data into an actual Q network and a reward function model;
wherein: actual Q network calculation yields QPractice ofAnd find Q according to an e-greedy strategyPractice ofCorresponding actions are output; the reward function model calculates the reward value R of the current iterationt(ii) a Randomly extracting n records in an experience pool by a target Q network, and combining the n records with RtCalculating QTarget,QPractice ofAnd QTargetCalculating loss functions together, updating network parameters of the actual Q network by using a gradient descent method, and copying all the parameters of the actual Q network to the target Q network when the iteration times reach a threshold value alpha;
(6) the motion time of the unmanned ship is 15 seconds each time, and when the accumulated motion time reaches 1h, the information of the wind speed, the ocean current, the wave height and the obstacles in the sea area is updated to the current time;
(7) and when the unmanned ship reaches the target point, finishing the iteration output of the safety path.
CN202010717418.XA 2020-07-23 2020-07-23 Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements Active CN111829527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010717418.XA CN111829527B (en) 2020-07-23 2020-07-23 Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010717418.XA CN111829527B (en) 2020-07-23 2020-07-23 Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements

Publications (2)

Publication Number Publication Date
CN111829527A CN111829527A (en) 2020-10-27
CN111829527B true CN111829527B (en) 2021-07-20

Family

ID=72925135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010717418.XA Active CN111829527B (en) 2020-07-23 2020-07-23 Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements

Country Status (1)

Country Link
CN (1) CN111829527B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180950B (en) * 2020-11-05 2022-07-08 武汉理工大学 Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning
CN112698646B (en) * 2020-12-05 2022-09-13 西北工业大学 Aircraft path planning method based on reinforcement learning
CN112580801B (en) * 2020-12-09 2021-10-15 广州优策科技有限公司 Reinforced learning training method and decision-making method based on reinforced learning
CN112800545B (en) * 2021-01-28 2022-06-24 中国地质大学(武汉) Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN
CN112947431B (en) * 2021-02-03 2023-06-06 海之韵(苏州)科技有限公司 Unmanned ship path tracking method based on reinforcement learning
CN113176776B (en) * 2021-03-03 2022-08-19 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN113297801B (en) * 2021-06-15 2022-10-14 哈尔滨工程大学 Marine environment element prediction method based on STEOF-LSTM
WO2023108494A1 (en) * 2021-12-15 2023-06-22 中国科学院深圳先进技术研究院 Probability filtering reinforcement learning-based unmanned ship control method and apparatus, and terminal device
CN114371700B (en) * 2021-12-15 2023-07-18 中国科学院深圳先进技术研究院 Probability filtering reinforcement learning unmanned ship control method and device and terminal equipment
CN114721409B (en) * 2022-06-08 2022-09-20 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN114942596B (en) * 2022-07-26 2022-11-18 山脉科技股份有限公司 Intelligent control system for urban flood control and drainage
CN115493595A (en) * 2022-09-28 2022-12-20 天津大学 AUV path planning method based on local perception and near-end optimization strategy
CN115855226B (en) * 2023-02-24 2023-05-30 青岛科技大学 Multi-AUV cooperative underwater data acquisition method based on DQN and matrix completion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102278995A (en) * 2011-04-27 2011-12-14 中国石油大学(华东) Bayes path planning device and method based on GPS (Global Positioning System) detection
CN102788581A (en) * 2012-07-17 2012-11-21 哈尔滨工程大学 Ship route planning method based on modified differential evolution algorithm
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110514206A (en) * 2019-08-02 2019-11-29 中国航空无线电电子研究所 A kind of unmanned plane during flying path prediction technique based on deep learning
CN111338356A (en) * 2020-04-07 2020-06-26 哈尔滨工程大学 Multi-target unmanned ship collision avoidance path planning method for improving distributed genetic algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102215520B1 (en) * 2018-09-13 2021-02-15 주식회사 웨더아이 Method and server for providing course information of vessel including coast weather information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102278995A (en) * 2011-04-27 2011-12-14 中国石油大学(华东) Bayes path planning device and method based on GPS (Global Positioning System) detection
CN102788581A (en) * 2012-07-17 2012-11-21 哈尔滨工程大学 Ship route planning method based on modified differential evolution algorithm
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110514206A (en) * 2019-08-02 2019-11-29 中国航空无线电电子研究所 A kind of unmanned plane during flying path prediction technique based on deep learning
CN111338356A (en) * 2020-04-07 2020-06-26 哈尔滨工程大学 Multi-target unmanned ship collision avoidance path planning method for improving distributed genetic algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning;Siyu guo等;《SENSORS》;20200111;第20卷(第2期);全文 *

Also Published As

Publication number Publication date
CN111829527A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN111829527B (en) Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
CN111780777B (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN112650237B (en) Ship path planning method and device based on clustering processing and artificial potential field
CN111399506A (en) Global-local hybrid unmanned ship path planning method based on dynamic constraints
CN111273670B (en) Unmanned ship collision prevention method for fast moving obstacle
KR102373472B1 (en) Method and device for seamless parameter switch by using location-specific algorithm selection to achieve optimized autonomous driving in each of regions
Wang et al. Cooperative collision avoidance for unmanned surface vehicles based on improved genetic algorithm
CN111880549A (en) Unmanned ship path planning-oriented deep reinforcement learning reward function optimization method
Deraj et al. Deep reinforcement learning based controller for ship navigation
CN112650246B (en) Ship autonomous navigation method and device
CN112819255B (en) Multi-criterion ship route determining method and device, computer equipment and readable storage medium
CN112880678A (en) Unmanned ship navigation planning method in complex water area environment
CN112462786A (en) Unmanned ship collision avoidance method based on fuzzy control strategy double-window algorithm
CN117193296A (en) Improved A star unmanned ship path planning method based on high safety
CN114387822B (en) Ship collision prevention method
Seo et al. Ship collision avoidance route planning using CRI-based A∗ algorithm
Su et al. A constrained locking sweeping method and velocity obstacle based path planning algorithm for unmanned surface vehicles in complex maritime traffic scenarios
Guo et al. Mission-driven path planning and design of submersible unmanned ship with multiple navigation states
Wang et al. Roboat III: An autonomous surface vessel for urban transportation
Masmoudi et al. Autonomous car-following approach based on real-time video frames processing
Gao et al. An optimized path planning method for container ships in Bohai bay based on improved deep Q-learning
CN116952239A (en) Unmanned ship path planning method based on fusion of improved A and DWA
CN117311160A (en) Automatic control system and control method based on artificial intelligence
CN117369441A (en) Self-adaptive intelligent ship path planning method considering ship kinematics and COLLEGs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant