CN115599093A

CN115599093A - Self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning

Info

Publication number: CN115599093A
Application number: CN202211171757.8A
Authority: CN
Inventors: 王国胤; 段振华; 刘群; 石岩; 邹贵银
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-01-13

Abstract

The invention belongs to the technical field of unmanned ship navigation, and particularly relates to a self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning; generating an obstacle environment and recording information, including the position of an obstacle and the position of a target point; constructing a fuzzy rule of fuzzy control of the unmanned ship; in the sailing process of the unmanned boat, calculating the distances of the unmanned boat relative to the barrier and the target point respectively and the yaw angle of the unmanned boat in real time; processing the real-time calculation result by adopting an unmanned ship fuzzy rule to output a fuzzy coefficient; the reward function in the deep reinforcement learning carries out self-adaptive reward on the unmanned ship according to the fuzzy coefficient; training an unmanned ship path planning model according to self-adaptive rewards of the unmanned ship in different states; automatically planning an optimal path by adopting a trained unmanned ship path planning model; the unmanned ship path planning method and the unmanned ship path planning system realize unmanned ship path planning, ensure the safety of the unmanned ship and improve the efficiency of the unmanned ship for executing tasks.

Description

Self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning

Technical Field

The invention belongs to the technical field of unmanned ship navigation, and particularly relates to a self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning.

Background

The research of unmanned ship starts at the end of 20 th century, and in the development and exploration of marine resources, unmanned ship has good development space and prospect in the fields of marine environment monitoring, sea surface search and rescue, marine patrol and the like by virtue of the advantages of small size, high maneuverability, no casualties and the like, and the unmanned ship technology is still an important research direction so far. However, due to the changing and unreal measurement of sea surface environment and the increasingly complex navigation environment, the improvement of the autonomous navigation capability of the unmanned ship has important practical significance. In order to ensure that the unmanned ship can quickly and safely arrive at a task point to complete a preset task, path planning is a key technology for ensuring the autonomous intelligence of the unmanned ship.

The path planning is a decision-making link in autonomous intelligent navigation, aims to establish an optimal path from a starting point to a target point, and ensures the safety of the path on the premise of meeting the requirement of minimum expense of the unmanned boat. At present, the conventional path planning algorithms include a × algorithm extended from Dijkstra algorithm for solving the shortest path in graph theory, a variant algorithm of a × and some intelligent optimization algorithms, such as: genetic algorithm, ant colony algorithm, simulated annealing algorithm, particle swarm algorithm and combination optimization algorithm among the algorithms. However, the traditional algorithms are too dependent on the environment model and the global environment information, and have great limitation in application scenarios. The complex and variable marine environment requires that the unmanned boat has the ability of autonomous learning.

Reinforcement learning is an important area of machine learning, emphasizing that agents obtain maximum rewards by interacting with the environment, which is intended for solving decision problems, especially continuous decision problems. The problem of path planning for unmanned boats can also be regarded as a continuous decision problem, and the current optimal navigation action is made under different states. At present, deep reinforcement learning is one of the directions of great interest in the field of artificial intelligence in recent years, and the sensing capability of the deep learning and the decision capability of the reinforcement learning are combined to directly control the behavior of an intelligent agent through high-dimensional sensing input. In reinforcement learning, the intelligent agent only measures the action quality at a certain moment through a reward signal obtained by interacting with the environment, and the decision of the intelligent agent is continuously optimized through the feedback of the reward signal so as to maximize the expected income of the target. Therefore, in deep reinforcement learning, the design of the reward function directly influences the training of the model.

In the path planning task, the design of the reward function naturally encounters the problem of sparse reward. The intelligent agent obtains positive reward only when reaching the target point, obtains negative reward when reaching the boundary or colliding with the obstacle, and obtains negative reward due to energy consumption in the normal navigation state. The inability of the agent to receive enough positive rewards in the interaction results in slow or even impossible learning. At present, most of unmanned ship path planning based on deep reinforcement learning adopts the distance negation of an unmanned ship and a target point as a reward function, however, the reward function causes the algorithm to have slow convergence speed and long training period, and even learns wrong decisions. It is therefore desirable to properly design the reward function to obtain optimal path decisions. If the intelligent agent is adaptively awarded corresponding rewards according to different states at different moments and network parameters are optimized through reward signals, the decision-making capability of the model can be improved, and the optimal path decision can be obtained.

Disclosure of Invention

In order to solve the problems, the invention provides a self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning, which comprises the following steps:

s1, generating a barrier environment for unmanned boat training and recording barrier environment information, including the position of a barrier and the position of a target point;

s2, introducing fuzzy logic to construct a fuzzy rule of the unmanned ship, and describing the distance between the unmanned ship and an obstacle and the distance between a target point and the yaw angle of the unmanned ship by adopting a membership function;

s3, in the navigation process of the unmanned boat, calculating the distances of the unmanned boat relative to the barrier and the target point respectively and calculating the yaw angle of the unmanned boat in real time;

s4, processing the real-time calculation result in the step S3 by adopting an unmanned ship fuzzy rule, and outputting a fuzzy coefficient positioned in an interval [0,1] in real time;

s5, designing an unmanned ship reward function, and carrying out self-adaptive reward on the unmanned ship according to the fuzzy coefficient by the unmanned ship reward function;

s6, constructing an unmanned ship path planning model based on deep reinforcement learning, and training the unmanned ship path planning model according to self-adaptive rewards of the unmanned ship in different states; and (4) automatically planning an optimal path by adopting the trained unmanned ship path planning model.

Further, a graphical interface module Tkinter of Python is adopted to randomly generate an obstacle environment, and the number of obstacles in the obstacle environment is also random.

Further, the unmanned ship fuzzy rule comprises an obstacle fuzzy rule and a target fuzzy rule, and specifically comprises the following steps:

acquiring the distance between an unmanned ship and an obstacle, the distance between the unmanned ship and a target point and the yaw angle of the unmanned ship, taking the distance between the unmanned ship and the obstacle and the yaw angle of the unmanned ship as input variables of an obstacle fuzzy rule, taking the distance between the unmanned ship and the target point and the yaw angle of the unmanned ship as input variables of a target fuzzy rule, and taking output variables of the obstacle fuzzy rule and the target fuzzy rule as a punishment fuzzy coefficient and a reward fuzzy coefficient respectively;

obstacle fuzzy rule: fuzzifying an input variable, namely dividing the distance between the unmanned ship and an obstacle into 5 sections, wherein BVN represents that the unmanned ship is very close to the obstacle, BN represents that the unmanned ship is close to the obstacle, BA represents that the unmanned ship is moderate to the obstacle, BF represents that the unmanned ship is far from the obstacle, and BVF represents that the unmanned ship is very far from the obstacle; dividing the yaw angle of the unmanned ship into 5 sections, wherein NRB represents a large angle to the right, NRS represents a small angle to the right, Z represents zero, PLS represents a small angle to the left, and PLB represents a large angle to the left; dividing the punishment fuzzy coefficients of the output variables into 5 sections, wherein PVS represents that the punishment fuzzy coefficients are very small, PS represents that the punishment fuzzy coefficients are small, PM represents that the punishment fuzzy coefficients are medium, PB represents that the punishment fuzzy coefficients are large, PVB represents that the punishment fuzzy coefficients are very large, and 25 punishment fuzzy rules are established; the closer the distance between the unmanned ship and the barrier is, and the smaller the yaw angle of the unmanned ship is, the larger the punished fuzzy coefficient is; the farther the distance between the unmanned ship and the barrier is, and the larger the yaw angle of the unmanned ship is, the smaller the punished fuzzy coefficient is;

target fuzzy rule: fuzzifying an input variable, namely dividing the distance between the unmanned ship and a target point into 5 sections, wherein the TVN represents that the distance between the TVN and the target point is very close, the TN represents that the distance between the TVN and the target point is close, the TA represents that the distance between the TF and the target point is moderate, the TF represents that the distance between the TF and the target point is long, and the TVF represents that the distance between the TF and the target point is very long; the size of the yaw angle of the unmanned ship is divided into 5 sections, the division mode is the same as that of the fuzzy rule of the obstacles, NRB represents a large angle to the right, NRS represents a small angle to the right, Z represents zero, PLS represents a small angle to the left, and PLB represents a large angle to the left; dividing the output variable reward fuzzy coefficient into 5 sections, dividing RVS into 5 sections, wherein the reward fuzzy coefficient is very small, RS is small, RM is medium, RB is large, and RVB is very large, and establishing 25 reward fuzzy rules; the closer the distance between the unmanned ship and the target point is, and the smaller the yaw angle of the unmanned ship is, the larger the reward fuzzy coefficient is; the farther the distance between the unmanned vehicle and the target point is, and the larger the yaw angle of the unmanned vehicle is, the smaller the reward ambiguity coefficient is.

Furthermore, the reward function comprises three parts of normal navigation, obstacle avoidance and target point reaching, wherein the normal navigation indicates that no obstacle exists in the detection range of the unmanned ship and the reward function R of normal navigation _n Expressed as:

wherein; rho _goal D representing a target point fuzzy coefficient obtained by inputting the distance between the unmanned ship and the target point at the current moment and the yaw angle of the unmanned ship at the current moment into a fuzzy logic controller _goal Represents the distance between the unmanned surface vehicle and the target point at the current moment, d _max Representing the distance between the initial position of the unmanned ship and the target point;

obstacle avoidance means that there is an obstacle in the detection range of the unmanned ship and the reward function R of obstacle avoidance _c Expressed as:

where ρ is _obs The distance between the unmanned ship and the barrier at the current moment and the yaw angle of the unmanned ship at the current moment are input into a fuzzy logic controller to obtain a barrier fuzzy coefficient r _det Maximum radius representing the detection range, d _obs Indicating the distance of the unmanned vehicle from the obstacle at the current moment.

Further, the reward function R to the target point _end Is a constant value, wherein when the unmanned boat reaches the target point, the reward fed back is R _end (ii) a When the unmanned ship reaches the boundary of the environment, the penalty of feedback is R _end (ii) a When the unmanned ship collides with the obstacle, the punishment of feedback is R _end 。

The invention has the beneficial effects that:

1. for the real-world path planning problem, the traditional path planning algorithm needs to establish a model about the environment in advance to perform path planning, but it is very difficult to know the environment information in advance in the real-world problem. Therefore, the invention uses the reinforcement learning method, can work in unknown environment, and enables the path planning task to have the self-adaptive environment capability.

2. Solving the problem of dimension explosion in the Q-learning model by using a convolution neural network approximate fitting value function in the DQN network model; the sequence samples are stored by using an experience playback mechanism, so that the utilization rate of the samples is improved, and the correlation of the samples is reduced; and a double-network mechanism is adopted, and the current network and the target network optimize the model, so that the stability of the training process is improved.

3. According to the invention, a reward function is designed by introducing fuzzy logic, so that reward feedback in reinforcement learning can be adaptively adjusted according to the state, a high-quality experience sample is obtained, the training time is shortened, and meanwhile, the intelligent agent can be better guided to move to a correct target position.

Drawings

Fig. 1 is a step diagram of a path planning method of an adaptive unmanned surface vehicle according to the present invention;

fig. 2 is a schematic diagram of a DQN network model of the present invention;

FIG. 3 is a diagram illustrating the design of reward functions of the present invention;

FIG. 4 is a schematic diagram of an unmanned surface vehicle path planning system employed by the present invention;

FIG. 5 is a schematic diagram of fuzzy logic control according to the present invention;

FIG. 6 is a yaw angle membership function of the present invention;

FIG. 7 is a function of the membership of the obstacle distance according to the invention;

FIG. 8 is a membership function of penalized fuzzy coefficient according to the present invention;

FIG. 9 is a target distance membership function of the present invention;

FIG. 10 is a membership function of the fuzzy award coefficient of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention discloses a self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning, which comprises the following steps of:

s3, in the process of sailing of the unmanned ship, calculating the distances of the unmanned ship relative to the barrier and the target point respectively and calculating the yaw angle of the unmanned ship in real time;

as shown in fig. 5, the distance and the yaw angle are input to the fuzzy controller, the input data are firstly fuzzified, the fuzzified data are deblurred by an inference engine formed by a rule base designed by the unmanned ship fuzzy rule in step S2 and a database for processing the fuzzy data, and finally the fuzzy coefficient in the interval [0,1] is output.

s6, constructing an unmanned ship path planning model based on deep reinforcement learning, and training the unmanned ship path planning model according to self-adaptive rewards of the unmanned ship in different states; and (4) autonomously planning an optimal path by adopting the trained unmanned ship path planning model.

Specifically, the unmanned ship fuzzy rule includes an obstacle fuzzy rule and a target fuzzy rule, and specifically includes:

and acquiring the distance between the unmanned boat and the obstacle, the distance between the unmanned boat and the target point and the yaw angle of the unmanned boat.

TABLE 1 obstacle fuzzy rule

Obstacle fuzzy rules As shown in Table 1, the distance between the unmanned surface vehicle and the obstacle is divided into 5 segments by fuzzy linguistic variables, and the distance is normalized to [0,20 ]]Ranges as shown in FIG. 7, the exact distance is translated into membership in different ranges, the values are blurred, [0,5 ]]BVN means very close distance to the obstacle, [0,10 ]]BN means a short distance from the obstacle, [5,15 ]]BA means moderate distance to the obstacle, [10,20]BF indicates a distance from the obstacle, [15,20 ]]BVF means very far from the obstacle; the yaw angle of the unmanned ship is divided into 5 sections by fuzzy linguistic variables, and the yaw angle is

The variation within the range is shown in figure 6,

a large angle to the right is denoted by NRB,

a small angle to the right is denoted by NRS,

in order that Z represents zero, the method further comprises,

to the left of PLS is indicated a small angle,

PLB represents a large left angle; the penalty ambiguity factor is also divided into 5 segments using the ambiguity linguistic variable as shown in FIG. 8, [0,0.25]For PVS, the penalty blur factor is very small, [0,0.5 ]]A PS indicates that the penalty blur factor is small, [0.25,0.75 ]]PM represents a moderate penalty blur coefficient, [0.5,1]PB indicates a large penalty blur coefficient, [0.75,1]The penalty blur factor is very large for PVB. Constructing an obstacle fuzzy rule according to human expert experience: when the distance between the unmanned ship and the barrier is closer and the yaw angle of the unmanned ship is smaller, it is indicated that the barrier is distributed on the current path of the unmanned ship sailing towards the target point and is approaching the barrier, the output punishment fuzzy coefficient is larger, and the punishment obtained by subsequent calculation is larger; when the distance between the unmanned ship and the obstacle is farther and the yaw angle of the unmanned ship is larger, it is shown that the unmanned ship deviates from the target point although being far away from the obstacle, the output penalty ambiguity coefficient is reduced, and the penalty in the state is smaller.

TABLE 2 target fuzzy rules

Target fuzzy gaugeThe distance between the unmanned ship and the target point is divided into 5 segments using fuzzy linguistic variables, and the distances are normalized to 0,20, as shown in table 2]In the range shown in FIG. 9, [0,5 ]]For TVN to indicate very close distance to the target point, [0,10 ]]TN indicates a close distance from the target point, [5,15]For TA indicates moderate distance from the target point, [10,20]For TF indicates a distance from the target point, [15,20 ]]Representing a TVF a very far distance from the target point; the size of the yaw angle of the unmanned ship is divided into 5 sections, the division mode is the same as the division mode of the fuzzy rule of the obstacles,

a large angle to the right is denoted by NRB,

a small angle to the right is denoted NRS,

in order that Z represents zero, the number of,

to the PLS is meant a small angle to the left,

PLB indicates a large left angle; the fuzzy coefficient of the reward is divided into 5 segments by the fuzzy linguistic variable as shown in FIG. 10, [0, 0.25%]For RVS, the prize blur factor is very small, [0,0.5 ]]As RS, the prize blur coefficient is small, [0.25,0.75 ]]RM represents a moderate reward fuzzy coefficient, [0.5,1]RB indicates a large prize blur coefficient, [0.75,1]Representing RVB that the prize blur factor is very large. Constructing a target fuzzy rule according to human expert experience: if the distance between the unmanned ship and the target point is shorter and the yaw angle of the unmanned ship is smaller, the course of the unmanned ship is correct and the unmanned ship is close to the target point, the larger the reward fuzzy coefficient is, and the larger the reward value is; if the distance between the unmanned boat and the target point is farther and the yaw angle of the unmanned boat is larger, the unmanned boat is far away from the target point, the reward fuzzy coefficient is reduced, and the reward under the state is smaller.

And obtaining membership values of the fuzzy coefficients belonging to different fuzzy linguistic variables according to a fuzzy rule, and obtaining a final output value through an area barycenter method.

Specifically, the reward function includes three parts of normal navigation, obstacle avoidance and target point arrival, which are specifically expressed as:

when calculating the reward of normal navigation, firstly judging whether the detection range of the unmanned ship contains obstacles or not, if no obstacles exist, indicating that the unmanned ship is in a safe area at the moment, and calculating according to a reward function of normal navigation:

wherein; rho _goal The distance between the unmanned ship and the target point at the current moment and the yaw angle of the unmanned ship at the current moment are input into the reward fuzzy coefficient obtained by the fuzzy logic controller, the distance between the unmanned ship and the target point changes along with the movement of the unmanned ship, and the reward fuzzy coefficients are different; d _goal Represents the distance between the unmanned boat and the target point at the current moment, d _max Indicating the distance of the original position of the unmanned boat from the target point.

The obstacle avoidance indicates that an obstacle exists in the detection range of the unmanned ship, the detection range is not a safe area at the moment, the danger of collision between the unmanned ship and the obstacle exists, namely, the obstacle avoidance reward function R indicates that the unmanned ship is not in the safe area and the obstacle avoidance is performed _c Expressed as:

where ρ is _obs The distance between the unmanned ship and the barrier at the current moment and the yaw angle of the unmanned ship are input into a penalty fuzzy coefficient obtained by a fuzzy logic controller, and the unmanned ship is not in the normal stateThe unmanned ship moves intermittently, the distance between the unmanned ship and the barrier is changed continuously in the traveling process, and reward fuzzy coefficients are different; r is _det Maximum radius representing the detection range, d _obs Indicating the distance between the drones and the obstacle at the current moment.

Specifically, an unmanned ship path planning model is constructed based on deep reinforcement learning, the unmanned ship path planning model is trained according to self-adaptive rewards of the unmanned ship in different states, as shown in fig. 2, the DQN network model solves the problem of sample correlation by introducing an experience playback pool mechanism, and the stability of the algorithm is improved through a target network and current network dual-network mechanism. The agent interacts with the environment once at time t to obtain the reward and the state at time t +1, and(s) _t ,a _t ,r _t ,s _t+1 ) Storing the experience information into an experience playback pool, sampling experience data in the experience playback pool, and sending the sampled experience data into a target network to predict a target Q value y _t ＝r _t +λmax _a Q(s _t+1 A; θ). The dual-network mechanism of the DQN model adopts networks with the same structure but different parameters, the current network is used for predicting a Q estimation value, the target network predicts a target Q value, and the loss function L (theta) = (y) is used _t -Q(s _t ,a _t ；θ)) ² And C, updating the current network parameters, and updating the parameters of the current network to the target network after the time step C.

In the above embodiment, the tkiner module of Python is used to construct the simulated training environment of the unmanned boat, which includes three main parts, i.e., the unmanned boat, the obstacle and the target position. In this embodiment, the unmanned surface vehicle is an agent controlled in reinforcement learning, and the detection range of the unmanned surface vehicle is a 135 ° sector area with a radius R. In the randomly generated simulation training environment, the number and the size of the obstacles are random; and saving the simulation training environment information data for later model training.

Wherein, the position information of the unmanned ship is given by a plane rectangular coordinate system form, (x) _t ,y _t ) Position of unmanned vessel at time t, θ _t Indicates the sailing direction of the unmanned ship at the moment t, alpha _t Connecting the unmanned boat with the target position at the moment tAngle with positive direction of x-axis, beta _t Is the angle between the line connecting the unmanned ship and the scanned barrier position at the time t and the positive direction of the x axis, d _obs Distance from unmanned surface vessel to obstacle at time t, d _goal The yaw angle of the unmanned ship is delta theta = theta for the distance from the unmanned ship to the target position at the moment t _t -α _t (ii) a D if no obstacle exists in the scanning area of the unmanned ship _obs ＝R。

Specifically, as shown in fig. 3, the adaptive unmanned surface vehicle path planning process based on fuzzy sets and depth reinforcement includes:

s11, calculating the barrier distance between the unmanned ship and the nearest barrier in the scanning range at the time t, the distance between the unmanned ship and a target point and the yaw angle of the unmanned ship;

s12, judging whether the distance between the obstacles is smaller than the detection radius R of the detection range of the unmanned ship, if so, executing a step S13, otherwise, executing a step S15;

s13, judging whether the distance between the obstacles is smaller than a safe distance, namely the minimum distance that the unmanned ship cannot collide certainly, if so, indicating that the unmanned ship collides with the obstacles or exceeds an environmental boundary, and feeding back punishment to the unmanned ship; otherwise, executing step S14;

s14, judging whether the target distance is smaller than the safety distance, if so, indicating that the unmanned boat reaches the target position, feeding back a reward to the unmanned boat, and finishing the round of training; otherwise, indicating that the unmanned ship discovers an obstacle in the detection range at the moment, calculating punishment by adopting an obstacle evasion reward function, and executing the step S11 when t = t + 1;

s15, judging whether the target distance is smaller than the safety distance, if so, indicating that the unmanned ship reaches the target position, further feeding back rewards to the unmanned ship, and finishing the training in the round; otherwise, the unmanned ship is in a safe area at the moment, the reward is calculated by adopting a normal navigation reward function, t = t +1, and step S11 is executed.

The traditional reward function takes the inverse of the distance between the unmanned ship and the target position as reward or punishs according to whether collision occurs or not, so that the convergence speed of the algorithm is low, the training period is long, and even the training fails. According to the method, a reward function is constructed based on fuzzy coefficients, as shown in fig. 4, an unmanned ship perceives environment information, a series of historical experience information is obtained through interaction with the environment, the historical information comprises state information and decision information at the moment t and reward information obtained through decision execution, an improved DQN model based on a Q-learning algorithm is used for training historical experience data, and finally a path with low time overhead and high safety is planned for the unmanned ship in an obstacle environment by using the training model.

In conclusion, the path planning method for the unmanned maritime craft, disclosed by the invention, realizes unmanned craft path planning, can be used for tasks such as ocean patrol, water surface monitoring and water surface rescue, and is safe and rapid in path; the condition that the unmanned ship collides with the barrier in the task is avoided, the safety of the unmanned ship is ensured, and the task execution efficiency of the unmanned ship is improved.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A self-adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning is characterized by comprising the following steps:

s1, generating an obstacle environment for unmanned boat training and recording obstacle environment information, wherein the obstacle environment information comprises the position of an obstacle and the position of a target point;

2. The adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning of claim 1, wherein a graphical interface module tkinet of Python is adopted to randomly generate an obstacle environment, and the number of obstacles in the obstacle environment is also random.

3. The adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning according to claim 1, wherein the unmanned ship fuzzy rules include an obstacle fuzzy rule and a target fuzzy rule, and specifically include:

obstacle fuzzy rule: fuzzifying an input variable, namely dividing the distance between the unmanned ship and an obstacle into 5 sections, wherein BVN represents that the unmanned ship is very close to the obstacle, BN represents that the unmanned ship is close to the obstacle, BA represents that the unmanned ship is moderate to the obstacle, BF represents that the unmanned ship is far from the obstacle, and BVF represents that the unmanned ship is very far from the obstacle; dividing the yaw angle of the unmanned ship into 5 sections, wherein NRB represents a large angle to the right, NRS represents a small angle to the right, Z represents zero, PLS represents a small angle to the left, and PLB represents a large angle to the left; dividing the punishment fuzzy coefficients of the output variables into 5 sections, wherein PVS represents that the punishment fuzzy coefficients are very small, PS represents that the punishment fuzzy coefficients are small, PM represents that the punishment fuzzy coefficients are medium, PB represents that the punishment fuzzy coefficients are large, PVB represents that the punishment fuzzy coefficients are very large, and establishing 25 punishment fuzzy rules; the closer the distance between the unmanned ship and the barrier is, and the smaller the yaw angle of the unmanned ship is, the larger the punished fuzzy coefficient is; the farther the distance between the unmanned ship and the obstacle is, and the larger the yaw angle of the unmanned ship is, the smaller the punished fuzzy coefficient is;

target fuzzy rule: fuzzifying an input variable, namely dividing the distance between the unmanned boat and a target point into 5 sections, wherein the TVN represents that the distance between the unmanned boat and the target point is very close, the TN represents that the distance between the TVN and the target point is close, the TA represents that the distance between the TA and the target point is moderate, the TF represents that the distance between the TF and the target point is far, and the TVF represents that the distance between the TVF and the target point is very far; the size of the yaw angle of the unmanned ship is divided into 5 sections, the division mode is the same as that of the fuzzy rule of the obstacles, NRB represents a large angle to the right, NRS represents a small angle to the right, Z represents zero, PLS represents a small angle to the left, and PLB represents a large angle to the left; dividing the output variable reward fuzzy coefficient into 5 sections, dividing RVS into 5 sections, wherein RVS represents that the reward fuzzy coefficient is very small, RS represents that the reward fuzzy coefficient is small, RM represents that the reward fuzzy coefficient is medium, RB represents that the reward fuzzy coefficient is large, and RVB represents that the reward fuzzy coefficient is very large, and establishing 25 reward fuzzy rules; the closer the distance between the unmanned ship and the target point is, and the smaller the yaw angle of the unmanned ship is, the larger the reward fuzzy coefficient is; the farther the distance between the unmanned boat and the target point is, and the larger the yaw angle of the unmanned boat is, the smaller the reward fuzzy coefficient is.

4. The adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning of claim 1, wherein the reward function comprises three parts of normal navigation, obstacle avoidance and target point arrival, the normal navigation indicates that no obstacle exists in the detection range of the unmanned ship, and the reward function R of the normal navigation indicates that the unmanned ship is in normal navigation _n Expressed as:

wherein; rho _goal D, expressing the reward fuzzy coefficient obtained by inputting the distance between the unmanned boat and the target point at the current moment and the yaw angle of the unmanned boat at the current moment into the fuzzy logic controller, d _goal Represents the distance between the unmanned surface vehicle and the target point at the current moment, d _max Representing the distance between the initial position of the unmanned ship and the target point;

obstacle avoidance means that there is an obstacle in the detection range of the unmanned ship and the reward function R for obstacle avoidance _c Expressed as:

wherein ρ _obs The penalty fuzzy coefficient r obtained by inputting the distance between the unmanned ship and the barrier at the current moment and the yaw angle of the unmanned ship at the current moment into the fuzzy logic controller _det Maximum radius representing the detection range, d _obs Indicating the distance of the unmanned vehicle from the obstacle at the current moment.

5. The adaptive unmanned ship path planning method based on fuzzy set and deep reinforcement learning of claim 4, wherein the reward function R to the target point _end Is a constant value, wherein when the unmanned boat reaches the target point, the reward fed back is R _end (ii) a When the unmanned ship reaches the boundary of the environment, the punishment of the feedback is R _end (ii) a When the unmanned ship collides with the obstacle, the punishment of feedback is R _end 。