CN111998847A - Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning - Google Patents

Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning Download PDF

Info

Publication number
CN111998847A
CN111998847A CN202010683268.5A CN202010683268A CN111998847A CN 111998847 A CN111998847 A CN 111998847A CN 202010683268 A CN202010683268 A CN 202010683268A CN 111998847 A CN111998847 A CN 111998847A
Authority
CN
China
Prior art keywords
geomagnetic
auv
target
value
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010683268.5A
Other languages
Chinese (zh)
Inventor
刘明雍
王聪
牛云
李嘉琦
汪培新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010683268.5A priority Critical patent/CN111998847A/en
Publication of CN111998847A publication Critical patent/CN111998847A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/04Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by terrestrial means
    • G01C21/08Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by terrestrial means involving use of the magnetic field of the earth
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/203Specially adapted for sailing ships

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Geology (AREA)
  • Navigation (AREA)

Abstract

The invention provides a geomagnetic bionic navigation method based on Deep reinforcement learning, which is characterized in that under the premise of no prior geomagnetic information, a Deep-Q-Network algorithm is used for making course decision, an underwater vehicle is guided to navigate to a target point, and the search efficiency of an AUV is increased by using a boundary function. And as the times of reaching the target point by the AUV are increased, the navigation path is gradually shortened, and finally the optimal path is approached.

Description

Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning
Technical Field
The invention relates to an underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning, and belongs to the technical field of underwater vehicle geomagnetic navigation.
Background
The underwater geomagnetic navigation and positioning has the characteristics of being passive, non-radiative, all-time, all-region and the like, and can effectively overcome the defects of the traditional underwater navigation system, such as the defect that an error of an inertial navigation system is accumulated along with time, the problem that a satellite navigation system has the defect of rapid attenuation of wireless signals in water and the like, so that the underwater geomagnetic navigation and positioning method is one of ideal methods for realizing real-time, continuous and accurate underwater autonomous navigation of an underwater vehicle.
At present, the main direction of geomagnetic navigation is geomagnetic matching navigation, geomagnetic data is measured in real time by using a geomagnetic sensor installed on a carrier, and the actually measured data is matched with a reference database (geomagnetic map) acquired in advance through a matching algorithm to determine the position information of the carrier. The method relies on accurate prior geomagnetic maps, but under severe environments such as underwater ocean currents, submerged reefs and the like, accurate geomagnetic map acquisition is extremely difficult. Some researchers have turned the research direction to navigation by using magnetic tendency, however, the current navigation method has certain randomness in course decision and lacks of learning, and when the AUV needs to go back and forth for many times between two places (such as transportation and patrol), the AUV cannot obtain experience from the previous round, and cannot learn a better path to navigate.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a geomagnetic bionic navigation method based on Deep reinforcement learning, which is characterized in that under the premise of no prior geomagnetic information, a Deep-Q-network (DQN) algorithm is used for making course decision, an underwater vehicle is guided to navigate to a target point, and the search efficiency of an AUV is increased by using a boundary function. And as the times of reaching the target point by the AUV are increased, the navigation path is gradually shortened, and finally the optimal path is approached.
The technical scheme of the invention is as follows:
the underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: initializing AUV navigation parameters: setting the movement speed v of the AUV, a sampling period delta T, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function;
step 2: initializing DQN algorithm parameters: initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; initialization state space and action space: taking the geomagnetic parameter of the position of the AUV as a state characteristic s, taking the course angle of the AUV as an action a, and initializing an action space as follows:
θ={θ12,…,θm},m=2π/Δθ
delta theta is a set course angle interval;
and step 3: selecting an action:
the current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in an action space, and selects the action with the maximum Q value or selects a random action through an epsilon-greedy strategy;
and 4, step 4: and executing the action:
the AUV executes the action a selected in step 3, moves to the next position according to the kinematic equation and calculates a normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a group of samples in the experience playback pool, and updates the state, where s is s ', and the reward function r is:
Figure BDA0002586625710000021
wherein k represents a certain movement moment of the aircraft, and bound () is a boundary function;
and 5: judging whether the target position is reached according to the normalized target function in the current state; if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished; otherwise, entering the next step;
step 6: judging whether the boundary is out of bounds by using a boundary function:
Figure BDA0002586625710000022
if the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated:
and 7: updating the network parameters:
if the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network.
Advantageous effects
The invention provides a geomagnetic bionic navigation algorithm based on deep reinforcement learning, aiming at the problem of geomagnetic navigation when an AUV executes two-place round-trip tasks such as patrol, transportation and the like under the condition of no prior geomagnetic information.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a deep reinforcement learning geomagnetic navigation algorithm;
FIG. 2 is an AUV navigation track under deep reinforcement learning geomagnetic navigation algorithm;
FIG. 3 is a normalized objective function convergence curve;
fig. 4 is a sub-objective function convergence curve of the three geomagnetic components;
FIG. 5 is an iteration step number statistic of two geomagnetic bionic navigation algorithms, wherein (a) is the algorithm of the invention, and (b) is an evolution search algorithm.
Detailed Description
The invention aims to solve the problems that the traditional evolutionary search algorithm has larger randomness in the selection of course angles, and experience cannot be acquired from AUV and a better strategy cannot be acquired when the AUV executes two-place round-trip tasks such as patrol, transportation and the like, and provides a geomagnetic bionic navigation algorithm based on deep reinforcement learning. When the AUV needs to go back and forth for multiple times between two places, a more optimal and shorter navigation path can be obtained through updating the strategy.
First, an AUV kinematic model is established.
Considering an Autonomous Underwater Vehicle (AUV) as a particle in a two-dimensional plane, the equation of motion is:
Figure BDA0002586625710000041
wherein k represents a certain moving moment of the aircraft, Δ T represents a sampling period, v represents the moving speed of the aircraft, and θ represents the heading angle of the aircraft.
In the present invention, both Δ T and v are set as constants, and therefore, the equation of motion can be simplified into the following form:
Figure BDA0002586625710000042
where L represents the motion step of the AUV.
The following describes the principle of geomagnetic bionic navigation problem
The geomagnetic field is formed by combining a plurality of geomagnetic componentsCan be expressed as B ═ B1,B2,…,Bn}. The geomagnetic components that can be observed at present include seven types, such as total magnetic field strength, east component, north component, vertical component, horizontal component, magnetic inclination angle, magnetic declination angle, and the like, and can provide position and direction information for navigation.
From a biological point of view, living beings have a sensitivity to magnetic tendencies when navigating with the aid of terrestrial magnetism. Therefore, the process of the geomagnetic bionic navigation is a process of converging the multi-geomagnetic parameters to sub-target values, namely a multi-target search problem. Accordingly, the geomagnetic bionic navigation can be expressed in the following form:
Figure BDA0002586625710000043
wherein, B is a geomagnetic parameter; f (B, k) represents the total objective function at the k-th time;
Figure BDA0002586625710000044
an ith geomagnetic parameter representing a position at a kth time;
Figure BDA0002586625710000045
an ith geomagnetic parameter representing a target position; thetakRepresenting the course angle at the kth time; giRepresenting a constraint relationship; f. ofi(B, k) represents an objective sub-function of the ith geomagnetic parameter at the kth time, as shown below.
Figure BDA0002586625710000051
Since the magnitude of the plurality of geomagnetic parameters is unequal, and the geomagnetic parameters are normalized, the multi-parameter objective function can be expressed as:
Figure BDA0002586625710000052
Figure BDA0002586625710000053
an ith geomagnetic parameter representing the starting position; the goal of geomagnetic bionic navigation is to converge geomagnetic parameters to a target value, and simultaneously, a target function approaches zero, namely:
Figure BDA0002586625710000054
in actual navigation, when the relative distance between the AUV and the target position is within a certain range, that is, the objective function is smaller than a certain threshold, the AUV is considered to be successfully navigated to the target position, that is:
F(B,k)<μ
based on the principle, the algorithm of the invention is used for realizing geomagnetic bionic navigation, and comprises the following specific steps:
step 1: and initializing AUV navigation parameters.
And setting the motion step length L of the AUV, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function.
Step 2: relevant parameters of the DQN algorithm are initialized.
Initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; state space and action space initialization: taking the geomagnetic parameter of the position of the AUV as a state characteristic s and the heading angle of the AUV as an action a, initializing an action space as follows:
θ={θ12,…,θm},m=2π/Δθ
and delta theta is the set course angle interval.
And step 3: an action is selected.
The current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in the action space, and selects the action with the maximum Q value or selects random actions through epsilon-greedy strategy, namely selects the action with the maximum Q value with probability of (0< <1) or selects random actions with probability of 1-, for example, 90% of the probability is the action with the maximum Q value, and 10% of the probability is the random action.
And 4, step 4: an action is performed.
The AUV performs action a, moves to the next position according to the kinematic equation and calculates the normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a set of samples in the empirical playback pool, and updates the state, let s be s '. The AUV kinematic equation, the reward function r, is as follows:
Figure BDA0002586625710000061
wherein k represents a certain moving moment of the aircraft, Δ T represents a sampling period, v represents the moving speed of the aircraft, and θ represents the heading angle of the aircraft.
Figure BDA0002586625710000062
Where bound () is a boundary function.
When the AUV performs geosteering using deep reinforcement learning, there is a huge unexplored state space, and the DQN network parameters are initialized randomly, which initially cannot provide correct heading. In this case, the AUV needs to go through a long exploration process to find the target point.
In the exploration process, there are many states that need not be explored because they do not become part of the optimal path, i.e., invalid states. Therefore, the exploration range of the AUV is limited by constructing a boundary function, so that the search efficiency of the AUV is remarkably increased.
The normalized target function represents the difference of geomagnetic parameters of the current position and the target position, and can indirectly reflect the relative distance between the current position of the AUV and the target position. At each moment, the area where the objective function is reduced can be regarded as a circular area with the target position as the center and the relative distance between the AUV and the target position as the radius. It is easy to know that the AUV can approach the target position when exploring towards the circle; when the AUV searches for the out-of-circle area, it is inevitable to move away from the target position, i.e., the out-of-circle area is in an invalid state. The boundary function is set as follows:
Figure BDA0002586625710000063
under the limit of the boundary function, when the exploration position of the AUV at the next moment is out of the boundary, the AUV returns to the position at the previous moment, thereby avoiding useless search. Therefore, the AUV only continuously searches the area in the circle determined by the boundary function, and the searching efficiency and the learning efficiency of the navigation algorithm are greatly improved.
And 5: and judging whether the target position is reached according to the normalized target function in the current state. And if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished. Otherwise, the next step is entered. The judgment formula is as follows:
F(B,k)<μ
step 6: and judging whether the boundary is out of bounds or not by using a boundary function. If the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated.
And 7: and updating the network parameters.
If the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein when the parameters of the current Q value network are updated, the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network. By means of navigation and updating of the current Q value network in the navigation process, the current Q value network can make a more optimal decision.
In order to verify the feasibility and superiority of the algorithm, a simulation experiment was performed using Python3, and an international geomagnetic reference field (IGRF-12) was used to provide the required multi-geomagnetic feature data.
The specific implementation mode is as follows:
(1) geomagnetic feature selection. Three mutually independent parameters are selected from various geomagnetic characteristic quantities, namely magnetic north components BxEast component of geomagnetism ByPerpendicular component of geomagnetism BzAs a navigation feature quantity.
(2) Initializing AUV parameters and geomagnetic bionic navigation parameters. Three geomagnetic characteristic quantities of an AUV initial position are respectively as follows: b isx=22270nT,By=-1993nT,Bz-9732 nT. The three geomagnetic parameters of the target position are respectively as follows: b isx=22583nT, By=-1881nT,Bz-9228 nT. The AUV movement step L is 500m, the heading interval angle Δ θ is 45 °, and the navigation end threshold μ is 0.0005.
(3) DQN algorithm parameter initialization. The learning rate α is 0.0001, the discount factor γ is 0.9, the epsilon greedy strategy is 0.9, the empirical playback pool capacity D is 500000, the sample sampling number m is 64, and the target Q value network parameter update period N is 200. The current Q value network parameter w and the target Q value network parameter w' both use an orthogonal initialization method.
(4) And the AUV completes the geomagnetic navigation task by utilizing the DQN geomagnetic bionic navigation algorithm.
The navigation trajectory of the algorithm of the present invention is shown in fig. 2. Shown in the figure is the navigation trajectory of the AUV after the DQN algorithm convergence. The whole track is relatively straight and reaches the target position approximately in a straight line, which reflects that the DQN algorithm learns a relatively excellent navigation strategy and guides the AUV to reach the target position in a relatively straight path.
FIG. 3 is a diagram illustrating the convergence process of the normalized objective function under the algorithm of the present invention. It can be seen that the convergence process of the objective function is relatively smooth and finally converges to the target value in 95 steps, which means that the navigation steps of the AUV are 95 steps. The convergence rate of the objective function is high in the initial stage, and gradually decreases as the objective function approaches the target position, and finally reaches the target value.
FIG. 4 shows the convergence process of the sub-objective functions of the three components of the earth magnet under the algorithm of the present invention. The convergence process for the sub-target functions for all three components is smooth and nearly uniform, with the east component converging slightly faster than the other two. As navigation progresses, the sub-objective functions of the three components converge to the target value simultaneously.
FIG. 5 is a graph of a statistical line of iteration steps in a multi-round experiment for two navigation algorithms. The motion step size of both algorithms is the same, so the number of steps can reflect the navigation path length. (a) The figure is a step number statistical diagram under the algorithm of the invention, and it can be seen that under the action of the algorithm, the number of search steps when the AUV reaches the target position for the first time is about 650 steps, and then the number of required search steps gradually decreases with the increase of the number of times of reaching the target position, and finally converges to about 90 steps. (b) The figure is a step number statistical diagram under the evolution search algorithm, and obviously, navigation under the evolution search algorithm has strong randomness and volatility, wherein a red dotted line represents the average iteration step number of the evolution search algorithm, and the step number is about 260. Comparing the two graphs (a) and (b), it can be found that although the initial iteration step number of the algorithm is higher than the maximum iteration step number of the evolutionary search algorithm, the algorithm can be rapidly converged to about 90 steps along with the progress of the round-trip task, and is far lower than the average iteration step number of the evolutionary search algorithm, namely 260 steps. Therefore, the algorithm can successfully navigate to the target position under the condition of no prior geomagnetic information, and has obvious superiority in the round-trip task of the AUV.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (4)

1. An underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: initializing AUV navigation parameters: setting the movement speed v of the AUV, a sampling period delta T, the geomagnetic parameter of the initial position, the geomagnetic parameter of the target position and the threshold value mu of the target function;
step 2: initializing DQN algorithm parameters: initializing a current Q value network parameter w, a target Q value network parameter w' and an experience playback pool capacity D; initialization state space and action space: taking the geomagnetic parameter of the position of the AUV as a state characteristic s, taking the course angle of the AUV as an action a, and initializing an action space as follows:
θ={θ12,…,θm},m=2π/Δθ
delta theta is a set course angle interval;
and step 3: selecting an action:
the current Q value network takes the AUV current state s as input, respectively outputs Q values of all actions in an action space, and selects the action with the maximum Q value or selects a random action through an epsilon-greedy strategy;
and 4, step 4: and executing the action:
the AUV executes the action a selected in step 3, moves to the next position according to the kinematic equation and calculates a normalized objective function F (B, k), obtains the next state s ' and the reward r, stores (s, a, r, s ') as a group of samples in the experience playback pool, and updates the state, where s is s ', and the reward function r is:
Figure FDA0002586625700000011
wherein k represents a certain movement moment of the aircraft, and bound () is a boundary function;
and 5: judging whether the target position is reached according to the normalized target function in the current state; if the target function is smaller than the threshold value mu, the AUV is considered to reach the target position, and the navigation is finished; otherwise, entering the next step;
step 6: judging whether the boundary is out of bounds by using a boundary function:
Figure FDA0002586625700000012
if the target function F (B, k) is larger than or equal to the boundary function bound (k-1), returning the AUV to the previous time position; if the objective function F (B, k) is smaller than the boundary function bound (k-1), the boundary function is updated:
and 7: updating the network parameters:
if the number of the sample groups in the experience playback pool is not more than m, directly returning to the step 3;
if the number of the sample groups in the empirical playback pool is greater than m, randomly extracting m groups of samples from the empirical playback pool, reversely propagating errors by using a gradient descent method, updating the current Q value network parameter w, copying the current Q value network parameter w to the target Q value network every N time steps, namely making w' ═ w, and then returning to the step 3, wherein the loss function of the DQN algorithm is as follows:
Loss=(Target Q-Q(s,a;w))2
Target Q=r+γmaxQ(s',a';w')
wherein gamma is a set discount factor, Q (s, a; w) refers to a Q value corresponding to the action a in a Q value vector which is output by a network and has the same dimension with the action space, wherein the state s in the sample (s, a, s', r) is input into a current Q value network; maxQ (s ', a'; w ') refers to the action corresponding to the maximum Q value, where the state s' in the sample (s, a, s ', r) is input to the target Q value network, and a' is the maximum Q value in the Q value vector output by the network.
2. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 1, wherein: the geomagnetic parameter selected in the step 1 is one or more of total magnetic field strength, east component, north component, vertical component, horizontal component, magnetic dip angle and magnetic declination.
3. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 2, wherein: the geomagnetic parameters selected in the step 1 are an east component, a north component and a vertical component.
4. The underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning according to claim 1, wherein: normalizing the objective function F (B, k) according to the formula
Figure FDA0002586625700000021
Is determined in which
Figure FDA0002586625700000022
An ith geomagnetic parameter representing a position of the AUV at the kth time;
Figure FDA0002586625700000023
an ith geomagnetic parameter representing a position of the target,
Figure FDA0002586625700000024
and n is the number of the selected geomagnetic parameters.
CN202010683268.5A 2020-07-16 2020-07-16 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning Pending CN111998847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010683268.5A CN111998847A (en) 2020-07-16 2020-07-16 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010683268.5A CN111998847A (en) 2020-07-16 2020-07-16 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN111998847A true CN111998847A (en) 2020-11-27

Family

ID=73468211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010683268.5A Pending CN111998847A (en) 2020-07-16 2020-07-16 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111998847A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809609A (en) * 2023-02-06 2023-03-17 吉林大学 Target searching method and system for multi-underwater autonomous aircraft

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107976188A (en) * 2017-10-12 2018-05-01 浙江大学 A kind of AUV led to based on ranging sound returns depressed place air navigation aid
US20180129213A1 (en) * 2016-11-09 2018-05-10 Johnson Outdoors Inc. System and method for automatically navigating a charted contour
CN108375379A (en) * 2018-02-01 2018-08-07 上海理工大学 The fast path planing method and mobile robot of dual DQN based on variation
CN108698677A (en) * 2015-12-09 2018-10-23 国立研究开发法人 海上·港湾·航空技术研究所 The method for setting path of underwater sailing body, using this method underwater sailing body optimum controling method and underwater sailing body
CN109724592A (en) * 2019-03-03 2019-05-07 西北工业大学 A kind of AUV earth magnetism bionic navigation method based on evolutionary gradient search
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
US20200074236A1 (en) * 2018-08-31 2020-03-05 Hitachi, Ltd. Reward function generation method and computer system
CN111307143A (en) * 2020-02-17 2020-06-19 东南大学 Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108698677A (en) * 2015-12-09 2018-10-23 国立研究开发法人 海上·港湾·航空技术研究所 The method for setting path of underwater sailing body, using this method underwater sailing body optimum controling method and underwater sailing body
US20180129213A1 (en) * 2016-11-09 2018-05-10 Johnson Outdoors Inc. System and method for automatically navigating a charted contour
CN107976188A (en) * 2017-10-12 2018-05-01 浙江大学 A kind of AUV led to based on ranging sound returns depressed place air navigation aid
CN108375379A (en) * 2018-02-01 2018-08-07 上海理工大学 The fast path planing method and mobile robot of dual DQN based on variation
US20200074236A1 (en) * 2018-08-31 2020-03-05 Hitachi, Ltd. Reward function generation method and computer system
CN109724592A (en) * 2019-03-03 2019-05-07 西北工业大学 A kind of AUV earth magnetism bionic navigation method based on evolutionary gradient search
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN111307143A (en) * 2020-02-17 2020-06-19 东南大学 Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CONG WANG; YUN NIU; MINGYONG LIU,ETC.: "Geomagnetic navigation for AUV based on deep reinforcement learning algorithm", 《2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS》, 8 December 2019 (2019-12-08), pages 2571 - 2575, XP033691548, DOI: 10.1109/ROBIO49542.2019.8961491 *
沈海青,郭晨,李铁山,等: "考虑航行经验规则的无人船舶智能避碰导航方法", 《哈尔滨工程大学学报》, vol. 39, no. 6, 30 June 2018 (2018-06-30) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809609A (en) * 2023-02-06 2023-03-17 吉林大学 Target searching method and system for multi-underwater autonomous aircraft

Similar Documents

Publication Publication Date Title
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
US11727812B2 (en) Airplane flight path planning method and device based on the pigeon-inspired optimization
CN112650237B (en) Ship path planning method and device based on clustering processing and artificial potential field
CN109254591B (en) Dynamic track planning method based on Anytime restoration type sparse A and Kalman filtering
CN109724592B (en) AUV geomagnetic bionic navigation method based on evolutionary gradient search
CN114625151B (en) Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN103776453B (en) A kind of multi-model scale underwater vehicle combined navigation filtering method
CN111307143B (en) Bionic navigation algorithm for multi-target evolution search based on geomagnetic gradient assistance
CN111381600B (en) UUV path planning method based on particle swarm optimization
CN110849355B (en) Bionic navigation method for geomagnetic multi-parameter multi-target rapid convergence
CN111207752B (en) Unmanned aerial vehicle track planning method based on dynamic tangent point adjustment
CN101520328A (en) Method for autonomous navigation using geomagnetic field line map
CN116263335A (en) Indoor navigation method based on vision and radar information fusion and reinforcement learning
CN106338919A (en) USV (Unmanned Surface Vehicle) track tracking control method based on enhanced learning type intelligent algorithm
CN111338350A (en) Unmanned ship path planning method and system based on greedy mechanism particle swarm algorithm
CN108897338A (en) Circular orbit Spacecraft formation Reconstructed anticollision paths planning method based on PIO
CN111930141A (en) Three-dimensional path visual tracking method for underwater robot
CN110617819A (en) Unmanned aerial vehicle terrain auxiliary navigation method based on ant colony algorithm path planning
CN109858137A (en) It is a kind of based on the complicated maneuvering-vehicle track estimation method that can learn Extended Kalman filter
CN113805609A (en) Unmanned aerial vehicle group target searching method based on chaos lost pigeon group optimization mechanism
CN116088576A (en) Unmanned aerial vehicle three-dimensional path planning method based on improved whale algorithm
CN117606490B (en) Collaborative search path planning method for autonomous underwater vehicle
CN111998847A (en) Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning
CN113325856B (en) UUV optimal operation path planning method based on countercurrent approximation strategy
CN114088098B (en) Auxiliary navigation path planning method for polar region underwater vehicle database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201127

WD01 Invention patent application deemed withdrawn after publication