CN110209152A - The deeply learning control method that Intelligent Underwater Robot vertical plane path follows - Google Patents

The deeply learning control method that Intelligent Underwater Robot vertical plane path follows Download PDF

Info

Publication number
CN110209152A
CN110209152A CN201910514354.0A CN201910514354A CN110209152A CN 110209152 A CN110209152 A CN 110209152A CN 201910514354 A CN201910514354 A CN 201910514354A CN 110209152 A CN110209152 A CN 110209152A
Authority
CN
China
Prior art keywords
network
underwater robot
intelligent underwater
experience
learner
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910514354.0A
Other languages
Chinese (zh)
Other versions
CN110209152B (en
Inventor
李晔
白德乾
姜言清
安力
武皓微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201910514354.0A priority Critical patent/CN110209152B/en
Publication of CN110209152A publication Critical patent/CN110209152A/en
Application granted granted Critical
Publication of CN110209152B publication Critical patent/CN110209152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/12Target-seeking control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention is to provide the deeply learning control methods that a kind of Intelligent Underwater Robot vertical plane path follows.Step 1 is required according to the path model- following control of Intelligent Underwater Robot, establishes the Intelligent Underwater Robot environment interacted with agent;Step 2 establishes agent's set;Step 3 establishes experience cache pool;Step 4 establishes learner;Step 5 carries out Intelligent Underwater Robot path model- following control using distributed deterministic policy gradient.The phenomenon that present invention is complicated and changeable for marine environment locating for Intelligent Underwater Robot, and traditional control method can not actively be interacted with environment, the deeply learning control method that design Intelligent Underwater Robot vertical plane path follows.The path model- following control task of Intelligent Underwater Robot is completed by distributed method using deterministic policy gradient, there is self study, precision is high, and adaptability is good, the stable advantage of learning process.

Description

The deeply learning control method that Intelligent Underwater Robot vertical plane path follows
Technical field
The present invention relates to a kind of submarine navigation device control method, specifically a kind of Intelligent Underwater Robot is vertical The deeply learning control method that face path follows.
Background technique
With deepening continuously for ocean development, Intelligent Underwater Robot is since it is flexible, easy to carry with movement, can be certainly The features such as main operation, has been widely used in marine environmental protection, marine resources development, and status becomes more and more important.Furthermore pass through standard Really control Intelligent Underwater Robot so that some extremely hazardous tasks become safety, such as explores submarine oil, repairs seabed Pipeline, and the position of tracking and record explosive substance.
Traditional path follow-up control method such as fuzzy logic control, PID control, the control of the face S need artificial adjustment Control parameter, control effect depend on the experience of people, Intelligent Underwater Robot cannot active interact with environment.In recent years Come, with the fast development of artificial intelligence technology, one of important content as artificial intelligence, intensified learning achieves in recent years A series of important breakthrough.In intensified learning, learner, which not may be notified that, will take which to take action, and have to be by trial To find which action can generate maximum return.Action can not only influence directly to reward, and can also influence next moment State, and pass through this state and influence all subsequent rewards.
Summary of the invention
The purpose of the present invention is to provide one kind to have the characteristics that self study, with high accuracy, is adapted to various complexity oceans The deeply learning control method that the Intelligent Underwater Robot vertical plane path of environment follows.
The object of the present invention is achieved like this:
Step 1 is required according to the path model- following control of Intelligent Underwater Robot, establishes the intelligence interacted with agent It can underwater robot environment;
Step 2 establishes agent's set;
Step 3 establishes experience cache pool;
Step 4 establishes learner;
Step 5 carries out Intelligent Underwater Robot path model- following control using distributed deterministic policy gradient.
The present invention may also include:
1. the Intelligent Underwater Robot environment that the foundation is interacted with agent is by the road of Intelligent Underwater Robot Diameter model- following control process model building determines the chief component of Markovian decision process at a Markovian decision process: Motion space, state space, observation space, reward function.
2. the chief component of the determining Markovian decision process specifically includes:
(1) motion space is determined
Motion space expression formula is F=[delS], and wherein delS indicates the rudder angle of Intelligent Underwater Robot hydroplane;
(2) state space is determined
State-space expression is S=[w, q, z, theta], and wherein w indicates Intelligent Underwater Robot in satellite coordinate system Under heave velocity, q indicates that rate of pitch of the Intelligent Underwater Robot under satellite coordinate system, z indicate intelligent underwater Depth of the people under earth coordinates, theta indicate pitch angle of the Intelligent Underwater Robot under earth coordinates;
(3) observation space is determined
Observation space is the function of state space: O=f (S), wherein following straight line path are as follows: O=[w, q, zdelta, Cos (theta), sin (theta)], zdelta=z-zr, zr indicate the depth where straight line path;
(4) reward function is determined
In intensified learning, the purpose or target of agency is formed according to distinctive signal, referred to as reward or reward function, Agent is passed to from environment, leads to the current shape generated after previous moment takes movement for evaluating Intelligent Underwater Robot The effect of state:
R (s, a)=R (s)+R (a)
Wherein:
R (s)=- (αww2qq2zzdelta2ttheta2)
R (a)=- (αa1delS2)
Wherein αw、αq、αz、αt、αa1It is weight coefficient.
3. agent's set of establishing specifically includes:
(1) K movement network is established simultaneously, and K movement network is interacted to Intelligent Underwater Robot environment simultaneously Establish agent's set;
(2) agent's set is from network parameter is received for acting the update of network, generation in agent's set from learner Reason people set will act that experience member that network and Intelligent Underwater Robot environment interact is handed down from one's ancestors to be delivered to experience cache pool, The expression formula of single experience member ancestral is:
(oi,ai,R(s,a)i)。
4. the experience cache pool of establishing specifically includes:
Experience cache pool is from being movement network and Intelligent Underwater Robot ring in Receiving Agent people set from agent's set The experience member ancestral that border interacts, experience cache pool are delivered to study for the experience member sampled according to priority is handed down from one's ancestors Person, the expression formula of priority sampling are as follows:
Wherein, piIt is the priority of experience member ancestral i, α is the coefficient of a very little greater than 0, for determining priority Degree, if α=0, priority sampling just becomes random uniform sampling.
It establishes learner 5. described and specifically includes:
(1) learner's network receives the experience member ancestral sampled according to priority from experience cache pool, and is learned Acquistion to network parameter be transmitted to agent set;
(2) learner uses performer-reviewer's structure, and wherein the input of performer's network is observation space, and output is movement Space, i.e. control variable, expression formula are F=[delS], and movement network is identical as performer's network structure;The input of reviewer's network It is observation space and motion space, output is the distribution of Z, and then the mean value of Z is acquired by distribution, and Z is indicated in t time step, According to tactful π, when state is s, desired return, i.e. state-action value after movement a are taken, using asking state-movement The form of Distribution value is than directly seeking merely state-action value average value or onlying demand state-action value form.
6. described specifically included using distributed deterministic policy gradient progress Intelligent Underwater Robot path model- following control:
(1) size for initializing the experience member ancestral sampled according to priority is M=256, the size of experience cache pool It is no more than 10 for R=1000000, the number K for acting network, the learning rate of performer's network and reviewer's network in learner For α00=0.0001, constant ε=0.00001 is explored, maximum explores number E=100, the maximum exploration step number explored every time It is T=1000;
(2) using performer-reviewer's network network weight parameter in random fashion initialization action network and learner (θ, w), wherein θ is the parameter for acting performer's network in network and learner, and w is the parameter of reviewer's network in learner;
(3) using (2) step initiation parameter be learner in performer's network and reviewer's network establish one respectively A target network, the parameter of target network are denoted as (θ ', w');
(4) K movement network is run parallel;
(5) according to priority p from experience cache pooliChoose sample experience member ancestral the M ((o that length is Ni:i+N,ai:i+N-1, R(o,a)i:i+N-1);
(6) distribution of Z is constructed
(7) according to performer-reviewer's network update in following formula calculating action network and learner
(8) network parameter is updated
θ←θ+αtδθ,
w←w+βtδw
(9) if the step number explored every time reaches 1000, terminate the exploration of current number;If do not reached, the is returned (5) step;
(10) if exploring number reaches 100, terminate experiment;If do not reached, (2) step is returned;
(11) return action network includes the Intelligent Underwater Robot path model- following control model of suitable parameters θ.
7. K movement network of the parallel operation specifically includes:
1) selection acts a,Wherein Section 2 indicates fixed Gaussian noise;
2) execution acts a, and the R that is recompensed (s, a) and the observation state o' of subsequent time;
3) by experience member ancestral (oi,ai,R(s,a)i) be stored in experience cache pool;
4) step 1) -3 is repeated), until convergence or training terminate.
The present invention provides the deeply learning control method that a kind of Intelligent Underwater Robot vertical plane path follows, needles Complicated and changeable to marine environment locating for Intelligent Underwater Robot, traditional control method can not show with what environment actively interacted As the deeply learning control method that design Intelligent Underwater Robot vertical plane path follows.
The characteristics of present invention can be interacted actively with environment using intensified learning proposes logical using deterministic policy gradient Distributed method is crossed to complete the path model- following control task of Intelligent Underwater Robot, there is self study, precision is high, adaptability It is good, the stable advantage of learning process.
The invention has the benefit that
1. the present invention has self study, the good feature of adaptability, the spy innately learnt with environmental interaction due to intensified learning Point, the deeply learning control method that Intelligent Underwater Robot vertical plane provided by the invention path follows can active and rings Border interacts, and is adapted to various complicated marine environment.
2. there is the present invention learning process to stablize, the good feature of learning outcome scalability.Intelligent water provided by the invention The deeply learning control method that lower robot vertical face path follows is provided more preferably by using distributed method, More stable learning signal;Learn simultaneously resulting control strategy destination path variation be not especially acutely in the case where can be with It directly uses, without training again, saves the time, improve efficiency.
Detailed description of the invention
Fig. 1 is overall construction drawing of the invention;
Fig. 2 is the schematic diagram of performer's network in present invention movement network and learner's structure;
Fig. 3 is the schematic diagram of reviewer's network in learner's structure of the present invention;
Fig. 4 is to carry out the simulation result that sinusoidal path follows using the method for the present invention.
Specific embodiment
It illustrates below and the present invention is described in more detail.
As shown in connection with fig. 1, it is overall construction drawing of the invention, specifically includes that
Step 1 is required according to the path model- following control of Intelligent Underwater Robot, establishes the intelligence interacted with agent It can underwater robot environment.
Step 2 establishes agent's set.
Step 3 establishes experience cache pool.
Step 4 establishes learner.
Step 5 carries out Intelligent Underwater Robot path model- following control using distributed deterministic policy gradient.
The deeply learning control method that Intelligent Underwater Robot vertical plane proposed by the present invention path follows, is tied below The drawings and specific embodiments are closed to be described in more detail the present invention.
Detailed implementation method of the invention the following steps are included:
1. by the path model- following control task process model building of Intelligent Underwater Robot at a Markovian decision process, really Determine the chief component of Markovian decision process: motion space, state space, observation space, reward function.
The first step determines motion space
Motion space expression formula is F=[delS], and wherein delS indicates the rudder angle of Intelligent Underwater Robot hydroplane;
Second step determines state space
State-space expression is S=[w, q, z, theta], and wherein w indicates Intelligent Underwater Robot in satellite coordinate system Under heave velocity, q indicates that rate of pitch of the Intelligent Underwater Robot under satellite coordinate system, z indicate intelligent underwater Depth of the people under earth coordinates, theta indicate pitch angle of the Intelligent Underwater Robot under earth coordinates.
Third step determines observation space
Observation space is the function of state space: O=f (S).For following straight line path, O=[w, q, zdelta, Cos (theta), sin (theta)], wherein zdelta=z-zr, zr indicate the depth where straight line path.
4th step, determines reward function
In intensified learning, the purpose or target of agency is formed according to distinctive signal, referred to as reward or reward function, Agent is passed to from environment, leads to the current shape generated after previous moment takes movement for evaluating Intelligent Underwater Robot The effect of state:
R (s, a)=R (s)+R (a)
Wherein:
R (s)=- (αww2qq2zzdelta2ttheta2)
R (a)=- (αa1delS2)
Wherein αw=0.5, αq=0.5, αz=1, αt=1, αa1=0.001.
2. agent's set is established, specifically:
The first step, by establishing K=3 movement network, K=3 movement network while and Intelligent Underwater Robot simultaneously Environment interacts to establish agent's set;
Second step, agent's set is from network parameter is received for acting network more in agent's set from learner Newly, agent's set is delayed the experience member experience handed down from one's ancestors that is delivered to that network is interacted with Intelligent Underwater Robot environment is acted Pond is deposited, the expression formula of single experience member ancestral is:
(oi,ai,R(s,a)i);
Third step, each movement network (Fig. 2) include two hidden layers h1, h2 and output layer output, wherein h1 There are 400 nodes, h2 there are 300 nodes, and output layer uses hyperbolic tangent function tanh.
3. experience cache pool is established, specifically:
Experience cache pool is from being that movement network with environment interacts to obtain in Receiving Agent people set from agent's set Experience member ancestral, experience cache pool is delivered to learner for the experience member sampled according to priority is handed down from one's ancestors, priority sampling Expression formula is as follows:
Wherein, piIt is the priority of experience member ancestral i, α is the coefficient of a very little greater than 0, for determining priority Degree, if α=0, priority sampling just becomes random uniform sampling.
4. learner's network is established, specifically:
First step learner network receives the experience member ancestral sampled according to priority from experience cache pool, and by its The network parameter for learning to obtain is transmitted to agent's set.
Second step learner uses performer-reviewer's structure, and wherein the input of performer's network (Fig. 2) is observation space, defeated It is motion space out, i.e., control variable, expression formula are that F=[delS] performer's network includes two hidden layer h1, and h2 and one defeated Layer output out, wherein h1 has 400 nodes, and h2 has 300 nodes, and output layer uses hyperbolic tangent function tanh;Reviewer The input of network (Fig. 3) is observation space and motion space, and output is the distribution of Z, and Z is indicated in t time step, according to tactful π, When state is s, desired return, i.e. state-action value after movement a are taken.Using seeking state-movement Distribution value shape For formula than directly seeking state-action value average value merely or onlying demand state-action value form, learning process is more stable. Comprising two hidden layers h1, h2 and output layer output, wherein h1 has 400 nodes, and h2 has 300 nodes, output layer Using softmax function.
5. carrying out Intelligent Underwater Robot path model- following control, including following step using distributed deterministic policy gradient It is rapid:
Step 1: the size for the experience member ancestral that initialization foundation priority samples is M=256, experience cache pool Size is R=1000000, and acting the number K of network, (K value follows task to be adjusted according to specific path, is usually no more than 10) learning rate of performer's network and reviewer's network in learner's network is α00=0.0001, exploration constant ε= 0.00001, maximum explores number E=100, and the maximum step number of exploring explored every time is T=1000.
Step 2: using the network weight parameter (θ, w) of random fashion initialization action network and learner's network, wherein θ is the parameter for acting performer's network in network and learner's network;W is the parameter of reviewer's network in learner's network.
Step 3: the use of the initiation parameter of second step being the performer's network and reviewer's network difference in learner's network A target network is established to reduce the concussion in learning process, the parameter of target network is denoted as (θ ', w').
Step 4: running K movement network parallel.
Step 5: according to priority p from experience cache pooliChoose sample experience member ancestral the M ((o that length is Ni:i+N, ai:i+N-1,R(o,a)i:i+N-1)。
Step 6: the distribution of construction Z
Step 7: the update according to following formula calculating action network and learner's network
Step 8: updating network parameter
θ←θ+αtδθ
w←w+βtδw
Step 9: terminating the exploration of current number if the step number explored every time reaches 1000;If do not reached, return Return the 5th step.
Step 10: terminating experiment if exploring number reaches 100;If do not reached, second step is returned.
Step 11: return action network, that is, include the Intelligent Underwater Robot path model- following control mould of suitable parameters θ Type.
6. Intelligent Underwater Robot path model- following control is carried out using distributed deterministic policy gradient, wherein the 4th step has Body are as follows:
The first step, selection act a,Wherein Section 2 indicates fixed Gaussian noise, and ε is to be Number is used to control the range of noise.
Second step, execution act a, and the R that is recompensed (s, a) and the observation state o' of subsequent time.
Third step, by experience member ancestral (oi,ai,R(s,a)i) be stored in experience cache pool.
4th step, repeats the above steps, until convergence or training terminate.

Claims (8)

1. the deeply learning control method that a kind of Intelligent Underwater Robot vertical plane path follows, it is characterized in that:
Step 1 is required according to the path model- following control of Intelligent Underwater Robot, establishes the intelligent water interacted with agent Lower robot environment;
Step 2 establishes agent's set;
Step 3 establishes experience cache pool;
Step 4 establishes learner;
Step 5 carries out Intelligent Underwater Robot path model- following control using distributed deterministic policy gradient.
2. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 1 path follows, It is characterized in that: the Intelligent Underwater Robot environment that the foundation is interacted with agent is by the path of Intelligent Underwater Robot Model- following control process model building determines the chief component of Markovian decision process at a Markovian decision process: dynamic Make space, state space, observation space, reward function.
3. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 2 path follows, It is characterized in that the chief component of the determining Markovian decision process specifically includes:
(1) motion space is determined
Motion space expression formula is F=[delS], and wherein delS indicates the rudder angle of Intelligent Underwater Robot hydroplane;
(2) state space is determined
State-space expression is S=[w, q, z, theta], and wherein w indicates Intelligent Underwater Robot under satellite coordinate system Heave velocity, q indicate that rate of pitch of the Intelligent Underwater Robot under satellite coordinate system, z indicate that Intelligent Underwater Robot exists Depth under earth coordinates, theta indicate pitch angle of the Intelligent Underwater Robot under earth coordinates;
(3) observation space is determined
Observation space is the function of state space: O=f (S), wherein following straight line path are as follows: O=[w, q, zdelta, cos (theta), sin (theta)], zdelta=z-zr, zr indicate the depth where straight line path;
(4) reward function is determined
In intensified learning, the purpose or target of agency is formed according to distinctive signal, referred to as reward or reward function, from ring Border passes to agent, leads to the current state generated after previous moment takes movement for evaluating Intelligent Underwater Robot Effect:
R (s, a)=R (s)+R (a)
Wherein:
R (s)=- (αww2qq2zzdelta2ttheta2)
R (a)=- (αa1delS2)
Wherein αw、αq、αz、αt、αa1It is weight coefficient.
4. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 1 path follows, It is characterized in that: agent's set of establishing specifically includes:
(1) K movement network is established simultaneously, and K movement network interacts to establish with Intelligent Underwater Robot environment simultaneously Agent's set;
(2) agent's set is from network parameter is received for acting the update of network, agent in agent's set from learner Set will act that experience member that network and Intelligent Underwater Robot environment interact is handed down from one's ancestors to be delivered to experience cache pool, individually The expression formula of experience member ancestral is:
(oi,ai,R(s,a)i)。
5. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 1 path follows, It is characterized in that: the experience cache pool of establishing specifically includes:
Experience cache pool from from agent's set Receiving Agent people set in be movement network and Intelligent Underwater Robot environment into The obtained experience member ancestral of row interaction, experience cache pool is delivered to learner for the experience member sampled according to priority is handed down from one's ancestors, excellent The expression formula of first grade sampling are as follows:
Wherein, piIt is the priority of experience member ancestral i, α is the coefficient of a very little greater than 0, the degree for determining priority, If α=0, priority sampling just becomes random uniform sampling.
6. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 1 path follows, It establishes learner it is characterized in that: described and specifically includes:
(1) learner's network receives the experience member ancestral sampled according to priority from experience cache pool, and is learnt To network parameter be transmitted to agent set;
(2) learner uses performer-reviewer's structure, and wherein the input of performer's network is observation space, and output is that movement is empty Between, i.e., control variable, expression formula are F=[delS], and movement network is identical as performer's network structure;The input of reviewer's network is Observation space and motion space, output are the distributions of Z, and then the mean value of Z is acquired by distribution, and Z is indicated in t time step, root According to tactful π, when state is s, desired return, i.e. state-action value after movement a are taken, using seeking state-action value The form of distribution is than directly seeking merely state-action value average value or onlying demand state-action value form.
7. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 1 path follows, It is characterized in that: described specifically included using distributed deterministic policy gradient progress Intelligent Underwater Robot path model- following control:
(1) size for initializing the experience member ancestral sampled according to priority is M=256, and the size of experience cache pool is R= 1000000, the number K for acting network is no more than 10, and the learning rate of performer's network and reviewer's network in learner is α00=0.0001, constant ε=0.00001 is explored, maximum explores number E=100, and the maximum step number of exploring explored every time is T =1000;
(2) using performer-reviewer's network network weight parameter in random fashion initialization action network and learner (θ, W), wherein θ is the parameter for acting performer's network in network and learner, and w is the parameter of reviewer's network in learner;
(3) using (2) step initiation parameter be learner in performer's network and reviewer's network establish a mesh respectively Network is marked, the parameter of target network is denoted as (θ ', w');
(4) K movement network is run parallel;
(5) according to priority p from experience cache pooliChoose sample experience member ancestral the M ((o that length is Ni:i+N,ai:i+N-1,R(o, a)i:i+N-1);
(6) distribution of Z is constructed
(7) according to performer-reviewer's network update in following formula calculating action network and learner
(8) network parameter is updated
θ←θ+αtδθ,
w←w+βtδw
(9) if the step number explored every time reaches 1000, terminate the exploration of current number;If do not reached, return (5) Step;
(10) if exploring number reaches 100, terminate experiment;If do not reached, (2) step is returned;
(11) return action network includes the Intelligent Underwater Robot path model- following control model of suitable parameters θ.
8. the deeply learning control method that Intelligent Underwater Robot vertical plane according to claim 7 path follows, It is characterized in that: K movement network of the parallel operation specifically includes:
1) selection acts a,Wherein Section 2 indicates fixed Gaussian noise;
2) execution acts a, and the R that is recompensed (s, a) and the observation state o' of subsequent time;
3) by experience member ancestral (oi,ai,R(s,a)i) be stored in experience cache pool;
4) step 1) -3 is repeated), until convergence or training terminate.
CN201910514354.0A 2019-06-14 2019-06-14 Depth reinforcement learning control method for intelligent underwater robot vertical plane path following Active CN110209152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910514354.0A CN110209152B (en) 2019-06-14 2019-06-14 Depth reinforcement learning control method for intelligent underwater robot vertical plane path following

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910514354.0A CN110209152B (en) 2019-06-14 2019-06-14 Depth reinforcement learning control method for intelligent underwater robot vertical plane path following

Publications (2)

Publication Number Publication Date
CN110209152A true CN110209152A (en) 2019-09-06
CN110209152B CN110209152B (en) 2022-04-05

Family

ID=67792707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910514354.0A Active CN110209152B (en) 2019-06-14 2019-06-14 Depth reinforcement learning control method for intelligent underwater robot vertical plane path following

Country Status (1)

Country Link
CN (1) CN110209152B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728368A (en) * 2019-10-25 2020-01-24 中国人民解放军国防科技大学 Acceleration method for deep reinforcement learning of simulation robot
CN110750096A (en) * 2019-10-09 2020-02-04 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in static environment
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN112462792A (en) * 2020-12-09 2021-03-09 哈尔滨工程大学 Underwater robot motion control method based on Actor-Critic algorithm
CN113033118A (en) * 2021-03-10 2021-06-25 山东大学 Autonomous floating control method of underwater vehicle based on demonstration data reinforcement learning technology
CN113110530A (en) * 2021-04-16 2021-07-13 大连海事大学 Underwater robot path planning method for three-dimensional environment
CN113534668A (en) * 2021-08-13 2021-10-22 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
CN116295449A (en) * 2023-05-25 2023-06-23 吉林大学 Method and device for indicating path of autonomous underwater vehicle

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
US20180293512A1 (en) * 2017-04-11 2018-10-11 International Business Machines Corporation New rule creation using mdp and inverse reinforcement learning
CN109379752A (en) * 2018-09-10 2019-02-22 ***通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
CN109407644A (en) * 2019-01-07 2019-03-01 齐鲁工业大学 One kind being used for manufacturing enterprise's Multi-Agent model control method and system
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180293512A1 (en) * 2017-04-11 2018-10-11 International Business Machines Corporation New rule creation using mdp and inverse reinforcement learning
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN109379752A (en) * 2018-09-10 2019-02-22 ***通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
CN109407644A (en) * 2019-01-07 2019-03-01 齐鲁工业大学 One kind being used for manufacturing enterprise's Multi-Agent model control method and system
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TOM SCHAUL等: "Prioritized Experience Replay", 《PROCEEDINGS OF WORKSHOPS AT THE 4TH INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750096A (en) * 2019-10-09 2020-02-04 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in static environment
CN110750096B (en) * 2019-10-09 2022-08-02 哈尔滨工程大学 Mobile robot collision avoidance planning method based on deep reinforcement learning in static environment
CN110728368B (en) * 2019-10-25 2022-03-15 中国人民解放军国防科技大学 Acceleration method for deep reinforcement learning of simulation robot
CN110728368A (en) * 2019-10-25 2020-01-24 中国人民解放军国防科技大学 Acceleration method for deep reinforcement learning of simulation robot
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN111638646B (en) * 2020-05-29 2024-05-28 平安科技(深圳)有限公司 Training method and device for walking controller of quadruped robot, terminal and storage medium
CN112462792A (en) * 2020-12-09 2021-03-09 哈尔滨工程大学 Underwater robot motion control method based on Actor-Critic algorithm
CN113033118A (en) * 2021-03-10 2021-06-25 山东大学 Autonomous floating control method of underwater vehicle based on demonstration data reinforcement learning technology
CN113110530B (en) * 2021-04-16 2023-11-21 大连海事大学 Underwater robot path planning method for three-dimensional environment
CN113110530A (en) * 2021-04-16 2021-07-13 大连海事大学 Underwater robot path planning method for three-dimensional environment
CN113534668B (en) * 2021-08-13 2022-06-10 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
CN113534668A (en) * 2021-08-13 2021-10-22 哈尔滨工程大学 Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
CN115657683B (en) * 2022-11-14 2023-05-02 中国电子科技集团公司第十研究所 Unmanned cable-free submersible real-time obstacle avoidance method capable of being used for inspection operation task
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
CN116295449A (en) * 2023-05-25 2023-06-23 吉林大学 Method and device for indicating path of autonomous underwater vehicle
CN116295449B (en) * 2023-05-25 2023-09-12 吉林大学 Method and device for indicating path of autonomous underwater vehicle

Also Published As

Publication number Publication date
CN110209152B (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN110209152A (en) The deeply learning control method that Intelligent Underwater Robot vertical plane path follows
CN112241176B (en) Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN108319293B (en) UUV real-time collision avoidance planning method based on LSTM network
Zhang et al. Ship motion attitude prediction based on an adaptive dynamic particle swarm optimization algorithm and bidirectional LSTM neural network
Wu et al. An optimization method for control parameters of underwater gliders considering energy consumption and motion accuracy
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
CN110362089A (en) A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN103744428A (en) Unmanned surface vehicle path planning method based on neighborhood intelligent water drop algorithm
CN114625151A (en) Underwater robot obstacle avoidance path planning method based on reinforcement learning
Zhang et al. AUV path tracking with real-time obstacle avoidance via reinforcement learning under adaptive constraints
CN110472738A (en) A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
CN110095120A (en) Biology of the Autonomous Underwater aircraft under ocean circulation inspires Self-organizing Maps paths planning method
CN109885061B (en) Improved NSGA-II-based dynamic positioning multi-objective optimization method
CN113837454A (en) Hybrid neural network model prediction method and system for three degrees of freedom of ship
Zhu et al. AUV dynamic obstacle avoidance method based on improved PPO algorithm
CN109460874A (en) A kind of ariyoshi wave height prediction technique based on deep learning
Ma et al. Multi-AUV collaborative operation based on time-varying navigation map and dynamic grid model
CN104155043B (en) A kind of dynamic positioning system external environment force measuring method
Zhou et al. Nonparametric modeling of ship maneuvering motions in calm water and regular waves based on R-LSTM hybrid method
Wang et al. MUTS-based cooperative target stalking for a multi-USV system
CN117555352A (en) Ocean current assisted path planning method based on discrete SAC
CN114840928B (en) Underwater vehicle cluster motion simulation method based on deep learning
Song et al. Search and tracking strategy of autonomous surface underwater vehicle in oceanic eddies based on deep reinforcement learning
CN116541951A (en) Ship thrust distribution method based on improved aigrette algorithm
CN112327838B (en) Multi-unmanned surface vessel multi-task allocation method based on improved self-mapping algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant