CN111998847A - Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning - Google Patents
Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN111998847A CN111998847A CN202010683268.5A CN202010683268A CN111998847A CN 111998847 A CN111998847 A CN 111998847A CN 202010683268 A CN202010683268 A CN 202010683268A CN 111998847 A CN111998847 A CN 111998847A
- Authority
- CN
- China
- Prior art keywords
- geomagnetic
- auv
- target
- value
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/04—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by terrestrial means
- G01C21/08—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by terrestrial means involving use of the magnetic field of the earth
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/10—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
- G01C21/12—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
- G01C21/16—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
- G01C21/165—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/203—Specially adapted for sailing ships
Landscapes
- Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Environmental & Geological Engineering (AREA)
- General Life Sciences & Earth Sciences (AREA)
- Geology (AREA)
- Navigation (AREA)
Abstract
The invention provides a geomagnetic bionic navigation method based on Deep reinforcement learning, which is characterized in that under the premise of no prior geomagnetic information, a Deep-Q-Network algorithm is used for making course decision, an underwater vehicle is guided to navigate to a target point, and the search efficiency of an AUV is increased by using a boundary function. And as the times of reaching the target point by the AUV are increased, the navigation path is gradually shortened, and finally the optimal path is approached.
Description
Technical Field
The invention relates to an underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning, and belongs to the technical field of underwater vehicle geomagnetic navigation.
Background
The underwater geomagnetic navigation and positioning has the characteristics of being passive, non-radiative, all-time, all-region and the like, and can effectively overcome the defects of the traditional underwater navigation system, such as the defect that an error of an inertial navigation system is accumulated along with time, the problem that a satellite navigation system has the defect of rapid attenuation of wireless signals in water and the like, so that the underwater geomagnetic navigation and positioning method is one of ideal methods for realizing real-time, continuous and accurate underwater autonomous navigation of an underwater vehicle.
At present, the main direction of geomagnetic navigation is geomagnetic matching navigation, geomagnetic data is measured in real time by using a geomagnetic sensor installed on a carrier, and the actually measured data is matched with a reference database (geomagnetic map) acquired in advance through a matching algorithm to determine the position information of the carrier. The method relies on accurate prior geomagnetic maps, but under severe environments such as underwater ocean currents, submerged reefs and the like, accurate geomagnetic map acquisition is extremely difficult. Some researchers have turned the research direction to navigation by using magnetic tendency, however, the current navigation method has certain randomness in course decision and lacks of learning, and when the AUV needs to go back and forth for many times between two places (such as transportation and patrol), the AUV cannot obtain experience from the previous round, and cannot learn a better path to navigate.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a geomagnetic bionic navigation method based on Deep reinforcement learning, which is characterized in that under the premise of no prior geomagnetic information, a Deep-Q-network (DQN) algorithm is used for making course decision, an underwater vehicle is guided to navigate to a target point, and the search efficiency of an AUV is increased by using a boundary function. And as the times of reaching the target point by the AUV are increased, the navigation path is gradually shortened, and finally the optimal path is approached.
The technical scheme of the invention is as follows:
the underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: initializing AUV navigation parameters: setting the movement speed v of the AUV, a sampling period delta T, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function;
step 2: initializing DQN algorithm parameters: initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; initialization state space and action space: taking the geomagnetic parameter of the position of the AUV as a state characteristic s, taking the course angle of the AUV as an action a, and initializing an action space as follows:
θ={θ1,θ2,…,θm},m=2π/Δθ
delta theta is a set course angle interval;
and step 3: selecting an action:
the current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in an action space, and selects the action with the maximum Q value or selects a random action through an epsilon-greedy strategy;
and 4, step 4: and executing the action:
the AUV executes the action a selected in step 3, moves to the next position according to the kinematic equation and calculates a normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a group of samples in the experience playback pool, and updates the state, where s is s ', and the reward function r is:
wherein k represents a certain movement moment of the aircraft, and bound () is a boundary function;
and 5: judging whether the target position is reached according to the normalized target function in the current state; if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished; otherwise, entering the next step;
step 6: judging whether the boundary is out of bounds by using a boundary function:
if the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated:
and 7: updating the network parameters:
if the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network.
Advantageous effects
The invention provides a geomagnetic bionic navigation algorithm based on deep reinforcement learning, aiming at the problem of geomagnetic navigation when an AUV executes two-place round-trip tasks such as patrol, transportation and the like under the condition of no prior geomagnetic information.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a deep reinforcement learning geomagnetic navigation algorithm;
FIG. 2 is an AUV navigation track under deep reinforcement learning geomagnetic navigation algorithm;
FIG. 3 is a normalized objective function convergence curve;
fig. 4 is a sub-objective function convergence curve of the three geomagnetic components;
FIG. 5 is an iteration step number statistic of two geomagnetic bionic navigation algorithms, wherein (a) is the algorithm of the invention, and (b) is an evolution search algorithm.
Detailed Description
The invention aims to solve the problems that the traditional evolutionary search algorithm has larger randomness in the selection of course angles, and experience cannot be acquired from AUV and a better strategy cannot be acquired when the AUV executes two-place round-trip tasks such as patrol, transportation and the like, and provides a geomagnetic bionic navigation algorithm based on deep reinforcement learning. When the AUV needs to go back and forth for multiple times between two places, a more optimal and shorter navigation path can be obtained through updating the strategy.
First, an AUV kinematic model is established.
Considering an Autonomous Underwater Vehicle (AUV) as a particle in a two-dimensional plane, the equation of motion is:
wherein k represents a certain moving moment of the aircraft, Δ T represents a sampling period, v represents the moving speed of the aircraft, and θ represents the heading angle of the aircraft.
In the present invention, both Δ T and v are set as constants, and therefore, the equation of motion can be simplified into the following form:
where L represents the motion step of the AUV.
The following describes the principle of geomagnetic bionic navigation problem
The geomagnetic field is formed by combining a plurality of geomagnetic componentsCan be expressed as B ═ B1,B2,…,Bn}. The geomagnetic components that can be observed at present include seven types, such as total magnetic field strength, east component, north component, vertical component, horizontal component, magnetic inclination angle, magnetic declination angle, and the like, and can provide position and direction information for navigation.
From a biological point of view, living beings have a sensitivity to magnetic tendencies when navigating with the aid of terrestrial magnetism. Therefore, the process of the geomagnetic bionic navigation is a process of converging the multi-geomagnetic parameters to sub-target values, namely a multi-target search problem. Accordingly, the geomagnetic bionic navigation can be expressed in the following form:
wherein, B is a geomagnetic parameter; f (B, k) represents the total objective function at the k-th time;an ith geomagnetic parameter representing a position at a kth time;an ith geomagnetic parameter representing a target position; thetakRepresenting the course angle at the kth time; giRepresenting a constraint relationship; f. ofi(B, k) represents an objective sub-function of the ith geomagnetic parameter at the kth time, as shown below.
Since the magnitude of the plurality of geomagnetic parameters is unequal, and the geomagnetic parameters are normalized, the multi-parameter objective function can be expressed as:
an ith geomagnetic parameter representing the starting position; the goal of geomagnetic bionic navigation is to converge geomagnetic parameters to a target value, and simultaneously, a target function approaches zero, namely:
in actual navigation, when the relative distance between the AUV and the target position is within a certain range, that is, the objective function is smaller than a certain threshold, the AUV is considered to be successfully navigated to the target position, that is:
F(B,k)<μ
based on the principle, the algorithm of the invention is used for realizing geomagnetic bionic navigation, and comprises the following specific steps:
step 1: and initializing AUV navigation parameters.
And setting the motion step length L of the AUV, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function.
Step 2: relevant parameters of the DQN algorithm are initialized.
Initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; state space and action space initialization: taking the geomagnetic parameter of the position of the AUV as a state characteristic s and the heading angle of the AUV as an action a, initializing an action space as follows:
θ={θ1,θ2,…,θm},m=2π/Δθ
and delta theta is the set course angle interval.
And step 3: an action is selected.
The current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in the action space, and selects the action with the maximum Q value or selects random actions through epsilon-greedy strategy, namely selects the action with the maximum Q value with probability of (0< <1) or selects random actions with probability of 1-, for example, 90% of the probability is the action with the maximum Q value, and 10% of the probability is the random action.
And 4, step 4: an action is performed.
The AUV performs action a, moves to the next position according to the kinematic equation and calculates the normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a set of samples in the empirical playback pool, and updates the state, let s be s '. The AUV kinematic equation, the reward function r, is as follows:
wherein k represents a certain moving moment of the aircraft, Δ T represents a sampling period, v represents the moving speed of the aircraft, and θ represents the heading angle of the aircraft.
Where bound () is a boundary function.
When the AUV performs geosteering using deep reinforcement learning, there is a huge unexplored state space, and the DQN network parameters are initialized randomly, which initially cannot provide correct heading. In this case, the AUV needs to go through a long exploration process to find the target point.
In the exploration process, there are many states that need not be explored because they do not become part of the optimal path, i.e., invalid states. Therefore, the exploration range of the AUV is limited by constructing a boundary function, so that the search efficiency of the AUV is remarkably increased.
The normalized target function represents the difference of geomagnetic parameters of the current position and the target position, and can indirectly reflect the relative distance between the current position of the AUV and the target position. At each moment, the area where the objective function is reduced can be regarded as a circular area with the target position as the center and the relative distance between the AUV and the target position as the radius. It is easy to know that the AUV can approach the target position when exploring towards the circle; when the AUV searches for the out-of-circle area, it is inevitable to move away from the target position, i.e., the out-of-circle area is in an invalid state. The boundary function is set as follows:
under the limit of the boundary function, when the exploration position of the AUV at the next moment is out of the boundary, the AUV returns to the position at the previous moment, thereby avoiding useless search. Therefore, the AUV only continuously searches the area in the circle determined by the boundary function, and the searching efficiency and the learning efficiency of the navigation algorithm are greatly improved.
And 5: and judging whether the target position is reached according to the normalized target function in the current state. And if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished. Otherwise, the next step is entered. The judgment formula is as follows:
F(B,k)<μ
step 6: and judging whether the boundary is out of bounds or not by using a boundary function. If the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated.
And 7: and updating the network parameters.
If the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein when the parameters of the current Q value network are updated, the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network. By means of navigation and updating of the current Q value network in the navigation process, the current Q value network can make a more optimal decision.
In order to verify the feasibility and superiority of the algorithm, a simulation experiment was performed using Python3, and an international geomagnetic reference field (IGRF-12) was used to provide the required multi-geomagnetic feature data.
The specific implementation mode is as follows:
(1) geomagnetic feature selection. Three mutually independent parameters are selected from various geomagnetic characteristic quantities, namely magnetic north components BxEast component of geomagnetism ByPerpendicular component of geomagnetism BzAs a navigation feature quantity.
(2) Initializing AUV parameters and geomagnetic bionic navigation parameters. Three geomagnetic characteristic quantities of an AUV initial position are respectively as follows: b isx=22270nT,By=-1993nT,Bz-9732 nT. The three geomagnetic parameters of the target position are respectively as follows: b isx=22583nT, By=-1881nT,Bz-9228 nT. The AUV movement step L is 500m, the heading interval angle Δ θ is 45 °, and the navigation end threshold μ is 0.0005.
(3) DQN algorithm parameter initialization. The learning rate α is 0.0001, the discount factor γ is 0.9, the epsilon greedy strategy is 0.9, the empirical playback pool capacity D is 500000, the sample sampling number m is 64, and the target Q value network parameter update period N is 200. The current Q value network parameter w and the target Q value network parameter w' both use an orthogonal initialization method.
(4) And the AUV completes the geomagnetic navigation task by utilizing the DQN geomagnetic bionic navigation algorithm.
The navigation trajectory of the algorithm of the present invention is shown in fig. 2. Shown in the figure is the navigation trajectory of the AUV after the DQN algorithm convergence. The whole track is relatively straight and reaches the target position approximately in a straight line, which reflects that the DQN algorithm learns a relatively excellent navigation strategy and guides the AUV to reach the target position in a relatively straight path.
FIG. 3 is a diagram illustrating the convergence process of the normalized objective function under the algorithm of the present invention. It can be seen that the convergence process of the objective function is relatively smooth and finally converges to the target value in 95 steps, which means that the navigation steps of the AUV are 95 steps. The convergence rate of the objective function is high in the initial stage, and gradually decreases as the objective function approaches the target position, and finally reaches the target value.
FIG. 4 shows the convergence process of the sub-objective functions of the three components of the earth magnet under the algorithm of the present invention. The convergence process for the sub-target functions for all three components is smooth and nearly uniform, with the east component converging slightly faster than the other two. As navigation progresses, the sub-objective functions of the three components converge to the target value simultaneously.
FIG. 5 is a graph of a statistical line of iteration steps in a multi-round experiment for two navigation algorithms. The motion step size of both algorithms is the same, so the number of steps can reflect the navigation path length. (a) The figure is a step number statistical diagram under the algorithm of the invention, and it can be seen that under the action of the algorithm, the number of search steps when the AUV reaches the target position for the first time is about 650 steps, and then the number of required search steps gradually decreases with the increase of the number of times of reaching the target position, and finally converges to about 90 steps. (b) The figure is a step number statistical diagram under the evolution search algorithm, and obviously, navigation under the evolution search algorithm has strong randomness and volatility, wherein a red dotted line represents the average iteration step number of the evolution search algorithm, and the step number is about 260. Comparing the two graphs (a) and (b), it can be found that although the initial iteration step number of the algorithm is higher than the maximum iteration step number of the evolutionary search algorithm, the algorithm can be rapidly converged to about 90 steps along with the progress of the round-trip task, and is far lower than the average iteration step number of the evolutionary search algorithm, namely 260 steps. Therefore, the algorithm can successfully navigate to the target position under the condition of no prior geomagnetic information, and has obvious superiority in the round-trip task of the AUV.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.
Claims (4)
1. An underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: initializing AUV navigation parameters: setting the movement speed v of the AUV, a sampling period delta T, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function;
step 2: initializing DQN algorithm parameters: initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; initialization state space and action space: taking the geomagnetic parameter of the position of the AUV as a state characteristic s, taking the course angle of the AUV as an action a, and initializing an action space as follows:
θ={θ1,θ2,…,θm},m=2π/Δθ
delta theta is a set course angle interval;
and step 3: selecting an action:
the current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in an action space, and selects the action with the maximum Q value or selects a random action through an epsilon-greedy strategy;
and 4, step 4: and executing the action:
the AUV executes the action a selected in step 3, moves to the next position according to the kinematic equation and calculates a normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a group of samples in the experience playback pool, and updates the state, where s is s ', and the reward function r is:
wherein k represents a certain movement moment of the aircraft, and bound () is a boundary function;
and 5: judging whether the target position is reached according to the normalized target function in the current state; if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished; otherwise, entering the next step;
step 6: judging whether the boundary is out of bounds by using a boundary function:
if the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated:
and 7: updating the network parameters:
if the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network.
2. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 1, wherein: the geomagnetic parameter selected in the step 1 is one or more of total magnetic field strength, east component, north component, vertical component, horizontal component, magnetic dip angle and magnetic declination.
3. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 2, wherein: the geomagnetic parameters selected in the step 1 are an east component, a north component and a vertical component.
4. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 1, wherein: normalizing the objective function F (B, k) according to the formula
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010683268.5A CN111998847A (en) | 2020-07-16 | 2020-07-16 | Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010683268.5A CN111998847A (en) | 2020-07-16 | 2020-07-16 | Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111998847A true CN111998847A (en) | 2020-11-27 |
Family
ID=73468211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010683268.5A Pending CN111998847A (en) | 2020-07-16 | 2020-07-16 | Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111998847A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115809609A (en) * | 2023-02-06 | 2023-03-17 | 吉林大学 | Target searching method and system for multi-underwater autonomous aircraft |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107976188A (en) * | 2017-10-12 | 2018-05-01 | 浙江大学 | A kind of AUV led to based on ranging sound returns depressed place air navigation aid |
US20180129213A1 (en) * | 2016-11-09 | 2018-05-10 | Johnson Outdoors Inc. | System and method for automatically navigating a charted contour |
CN108375379A (en) * | 2018-02-01 | 2018-08-07 | 上海理工大学 | The fast path planing method and mobile robot of dual DQN based on variation |
CN108698677A (en) * | 2015-12-09 | 2018-10-23 | 国立研究开发法人 海上·港湾·航空技术研究所 | The method for setting path of underwater sailing body, using this method underwater sailing body optimum controling method and underwater sailing body |
CN109724592A (en) * | 2019-03-03 | 2019-05-07 | 西北工业大学 | A kind of AUV earth magnetism bionic navigation method based on evolutionary gradient search |
CN110673637A (en) * | 2019-10-08 | 2020-01-10 | 福建工程学院 | Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning |
US20200074236A1 (en) * | 2018-08-31 | 2020-03-05 | Hitachi, Ltd. | Reward function generation method and computer system |
CN111307143A (en) * | 2020-02-17 | 2020-06-19 | 东南大学 | Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance |
-
2020
- 2020-07-16 CN CN202010683268.5A patent/CN111998847A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108698677A (en) * | 2015-12-09 | 2018-10-23 | 国立研究开发法人 海上·港湾·航空技术研究所 | The method for setting path of underwater sailing body, using this method underwater sailing body optimum controling method and underwater sailing body |
US20180129213A1 (en) * | 2016-11-09 | 2018-05-10 | Johnson Outdoors Inc. | System and method for automatically navigating a charted contour |
CN107976188A (en) * | 2017-10-12 | 2018-05-01 | 浙江大学 | A kind of AUV led to based on ranging sound returns depressed place air navigation aid |
CN108375379A (en) * | 2018-02-01 | 2018-08-07 | 上海理工大学 | The fast path planing method and mobile robot of dual DQN based on variation |
US20200074236A1 (en) * | 2018-08-31 | 2020-03-05 | Hitachi, Ltd. | Reward function generation method and computer system |
CN109724592A (en) * | 2019-03-03 | 2019-05-07 | 西北工业大学 | A kind of AUV earth magnetism bionic navigation method based on evolutionary gradient search |
CN110673637A (en) * | 2019-10-08 | 2020-01-10 | 福建工程学院 | Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning |
CN111307143A (en) * | 2020-02-17 | 2020-06-19 | 东南大学 | Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance |
Non-Patent Citations (2)
Title |
---|
CONG WANG; YUN NIU; MINGYONG LIU,ETC.: "Geomagnetic navigation for AUV based on deep reinforcement learning algorithm", 《2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS》, 8 December 2019 (2019-12-08), pages 2571 - 2575, XP033691548, DOI: 10.1109/ROBIO49542.2019.8961491 * |
沈海青,郭晨,李铁山,等: "考虑航行经验规则的无人船舶智能避碰导航方法", 《哈尔滨工程大学学报》, vol. 39, no. 6, 30 June 2018 (2018-06-30) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115809609A (en) * | 2023-02-06 | 2023-03-17 | 吉林大学 | Target searching method and system for multi-underwater autonomous aircraft |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
US11727812B2 (en) | Airplane flight path planning method and device based on the pigeon-inspired optimization | |
CN112650237B (en) | Ship path planning method and device based on clustering processing and artificial potential field | |
CN109254591B (en) | Dynamic track planning method based on Anytime restoration type sparse A and Kalman filtering | |
CN109724592B (en) | AUV geomagnetic bionic navigation method based on evolutionary gradient search | |
CN114625151B (en) | Underwater robot obstacle avoidance path planning method based on reinforcement learning | |
CN103776453B (en) | A kind of multi-model scale underwater vehicle combined navigation filtering method | |
CN111307143B (en) | Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance | |
CN111381600B (en) | UUV path planning method based on particle swarm optimization | |
CN110849355B (en) | Bionic navigation method for geomagnetic multi-parameter multi-target rapid convergence | |
CN111207752B (en) | Unmanned aerial vehicle track planning method based on dynamic tangent point adjustment | |
CN101520328A (en) | Method for autonomous navigation using geomagnetic field line map | |
CN116263335A (en) | Indoor navigation method based on vision and radar information fusion and reinforcement learning | |
CN106338919A (en) | USV (Unmanned Surface Vehicle) track tracking control method based on enhanced learning type intelligent algorithm | |
CN111338350A (en) | Unmanned ship path planning method and system based on greedy mechanism particle swarm algorithm | |
CN108897338A (en) | Circular orbit Spacecraft formation Reconstructed anticollision paths planning method based on PIO | |
CN111930141A (en) | Three-dimensional path visual tracking method for underwater robot | |
CN110617819A (en) | Unmanned aerial vehicle terrain auxiliary navigation method based on ant colony algorithm path planning | |
CN109858137A (en) | It is a kind of based on the complicated maneuvering-vehicle track estimation method that can learn Extended Kalman filter | |
CN113805609A (en) | Unmanned aerial vehicle group target searching method based on chaos lost pigeon group optimization mechanism | |
CN116088576A (en) | Unmanned aerial vehicle three-dimensional path planning method based on improved whale algorithm | |
CN117606490B (en) | Collaborative search path planning method for autonomous underwater vehicle | |
CN111998847A (en) | Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning | |
CN113325856B (en) | UUV optimal operation path planning method based on countercurrent approximation strategy | |
CN114088098B (en) | Auxiliary navigation path planning method for polar region underwater vehicle database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201127 |
|
WD01 | Invention patent application deemed withdrawn after publication |