CN109240091B

CN109240091B - Underwater robot control method based on reinforcement learning and tracking control method thereof

Info

Publication number: CN109240091B
Application number: CN201811342346.4A
Authority: CN
Inventors: 闫敬; 公雅迪; 罗小元; 杨晛; 李鑫
Original assignee: Yanshan University
Current assignee: Wang Bo
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2020-08-11
Anticipated expiration: 2038-11-13
Also published as: CN109240091A

Abstract

The invention discloses an underwater robot control method based on reinforcement learning and a tracking control method thereof, belonging to the field of underwater robot control. The control center gives expected track information of the underwater robot and sends the expected track information to the underwater robot; respectively selecting sampling points for the uncertain parameters according to the probability density function of the uncertain parameters in the underwater robot model, and reducing the order of the original dynamic model by using the sampling points; the method comprises the steps that an underwater robot interacts with the surrounding environment to learn environment information, one-step cost functions are calculated in different states to update values, the weight of the value functions corresponding to a control strategy is solved by a least square method, the control strategy is improved by a gradient descent method, and the two processes of value updating and strategy improving are iterated circularly until convergence, so that the optimal control strategy of the current position tracking expected track is obtained; and repeating the steps to obtain the optimal control strategy for tracking the rest expected tracks, and finally completing the tracking task.

Description

Underwater robot control method based on reinforcement learning and tracking control method thereof

Technical Field

The invention relates to the field of underwater robot control, in particular to an underwater robot control method based on reinforcement learning and a tracking control method thereof.

Background

With the more and more extensive application of ocean resources, underwater robots are also receiving more attention from people. An important application of the underwater robot in the sea is position tracking, but the underwater environment is complex and changeable, so that model parameters of the underwater robot are difficult to obtain, and the control difficulty is high.

In the prior art, patent application with publication number CN106708069A discloses a coordinated planning and control method for underwater mobile robot. The method comprises the steps of planning a current expected speed and state in real time through a dynamic tracking differentiator, converting task planning of a Cartesian space into a random coordinate system and speed and acceleration planning of each joint coordinate system through an iterative task priority method, and controlling an underwater robot and an operation arm through a dynamics method according to the speed and acceleration planning, so that the underwater mobile robot can perform tour and operation. However, the invention does not consider the influence of uncertainty in the underwater environment on the underwater robot, and in the marine environment, the underwater robot can be influenced by various interferences such as the acting forces of surge, swing and heave during the operation, and if the uncertain factors are not considered in the algorithm, the ideal effect can not be achieved in the actual operation.

Furthermore, the patent application with the publication number of CN107544256A designs underwater robot sliding mode control based on the adaptive backstepping method, and the invention provides an underwater robot sliding mode control method based on the adaptive backstepping method. The method is based on the decomposition of a complex nonlinear system, virtual control quantity is designed for a subsystem, the control quantity of the whole system is obtained by combining sliding mode step-by-step recursion, a radial basis function neural network is introduced into a controller aiming at the buffeting problem caused by uncertain upper bound of the system, the internal uncertainty and external interference of the system are approached in a self-adaptive mode, the buffeting of the system is finally controlled, high-precision tracking control is realized, the robustness of a closed-loop system is improved, and the engineering requirements are met. The internal uncertainty and the external interference proposed in the invention are determined parameters, but in an actual working environment, when parameters which can cause interference to the underwater robot are considered, the parameters are set to be time-varying uncertain parameters.

Disclosure of Invention

The invention aims to overcome the defects and provides an underwater robot control method based on reinforcement learning, which can accurately track a target track, reduce the sampling times of a system with uncertain parameters and realize control by learning the environment by using the underwater robot.

In order to achieve the purpose, the invention adopts the following technical scheme:

an underwater robot control method based on reinforcement learning is characterized by comprising the following steps:

step 1, establishing a fixed reference system based on the self expected track position of the robot and an inertial reference system based on uncertain factors of an underwater environment for the self position of the underwater robot;

step 2, for an inertial reference system, constructing an output model of the system mapping robot based on uncertain factors in the front-back direction, the left-right direction and the up-down direction:

in the formula, a_iIs the ith uncertainty factor suffered by the underwater robot,

for each uncertainty factor a_iAll follow independent probability density functions

Sampling each uncertain factor at fixed points according to the respective probability density function of the uncertain factor, training a system mapping robot output model by using the sampling points, and constructing a reduced-order system mapping robot output model:

in the formula (I), the compound is shown in the specification,

is the coefficient of the uncertain factor in the low-order mapping;

step 3, converting the real position of the underwater robot into the coordinates in the fixed reference system in the step 1, and obtaining model output mapped by the robot reduced-order system in the inertial reference system in the step 2;

step 4, defining the real positions of the underwater robot in different states k as follows:

p(k)＝[x(k),y(k),z(k)]^T

defining expected track positions of the underwater robot in different states k as follows:

p_r(k)＝[x_r(k),y_r(k),z_r(k)]^T

defining the one-step cost function of the next action of the underwater robot under different states k as

g_k(p,u)＝(x(k)-x_r(k))²+(y(k)-y_r(k))²+(z(k)-z_r(k))²+u²(k)

Wherein (x-x)_r)²+(y-y_r)²+(z-z_r)²Representing the cost of the underwater robot position error, u is the underwater robot controller input, u²Represents a cost of consuming energy;

training the robot according to a one-step cost function generated by the position movement of the underwater robot to obtain a value function

V(p(k))＝E_a(k){g_k(p,u)+γV(p(k+1))}

Wherein γ ∈ (0,1) is a discount factor, E_a(k) Representing the expectation function at state k;

let V equal W^TΦ (p), obtaining a value model of the control method using an iterative weighting method:

W_j+1Φ(p(k))＝E_a(k)[g_k(p,u)+γW_jΦ(p(k+1))]

in the formula (I), the compound is shown in the specification,

is a basis vector, W is a weight vector;

step 5, solving a value model of the control method; let h (p) be U^Tσ (p), wherein the weight vector U is updated with a gradient descent method, the control method is improved with a minimum cost function:

wherein h (p) is the next action performed in each state when the underwater robot performs position tracking, and h (p) is used as an optimal control strategy;

step 6, simultaneously converging the two processes of updating the value model of the control method and improving the control strategy by using an iterative weight method, and completing the solution of the optimal control strategy in the current state;

and 7, inputting the real position in the step 3 into the step 4, obtaining the next optimal control strategy through the operations in the steps 5-6, inputting the optimal control strategy as output into the system mapping robot output model in the step 2, and circularly repeating the operations in the steps 3 and 7 to complete the tracking task of the underwater robot.

The further technical scheme is that the uncertain factors in the step 1 are underwater surging, swinging and heaving.

The further technical scheme is that the reduced order system in the step 2 maps the output mean value E '(G' (a)) of the robot output model₁,a₂,a₃) And the output mean value E (G (a)) of the robot output model is mapped with the system₁,a₂,a₃) Are identical).

The further technical scheme is that the specific steps of the step 4 are as follows:

the self position of the underwater robot under different states k is p (k) ═ x (k), y (k), z (k)]^TThe expected trajectory is p_r(k)＝[x_r(k),y_r(k),z_r(k)]^T(ii) a In order to obtain an optimal control strategy, namely the action h performed by the underwater robot in each state when the underwater robot performs position tracking, a one-step cost function of the underwater robot in different states is set as g_k(p,u)＝(x(k)-x_r(k))²+(y(k)-y_r(k))²+(z(k)-z_r(k))²+u²(k) Wherein (x-x)_r)²+(y-y_r)²+(z-z_r)²Representing the cost of the tracking error, u is the underwater robot controller input, u²Represents a cost of consuming energy; calculating a cost function through the set one-step cost function:

V(p(k))＝E_a(k){g_k+γV(p(k+1))}

at priceIn the process of updating the value, let V become W^TΦ (p), the cost function can then be expressed as: w_j+1Φ(p(k))＝E_a(k)[g_k(p,u)+γW_jΦ(p(k+1))]

In the formula (I), the compound is shown in the specification,

is a base vector; w is a weight vector, and iterative solution is carried out by a least square method; after obtaining the value function, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is equal to U^Tσ (p), wherein the weight vector U is updated by gradient descent, σ (p) being the basis vector; the control strategy is improved by using the minimum cost function:

and h (p) is a control strategy obtained by learning the environment by the underwater robot, and the strategy is an optimal control strategy.

The further technical scheme is that the specific content of the step 6 is as follows:

and when the value model of the control method is updated and the control strategy is improved by using an iterative weight method each time, and the obtained weight change is smaller than a threshold value of 0.001, the convergence is regarded, and h after the iteration is finished is input to the underwater robot as the input u of the controller.

The underwater robot control method based on reinforcement learning controls the underwater robot to realize the tracking of the tracked object.

Compared with the prior art, the invention has the following advantages:

the invention samples the uncertain parameters of the underwater robot relating to the underwater uncertain factors by using a reduction method, and can give accurate output statistics of the original mapping, thereby reducing the calculation cost and effectively reducing the simulation times.

The invention uses the reinforcement learning method to track the position of the underwater robot, integrates the advantages of self-adaption and optimal control, and seeks an optimal feedback strategy by using the response of the environment. By utilizing the surrounding environment information, the underwater robot can find the control strategy which best accords with the target track through self learning by multiple iterations.

The invention realizes the intelligent tracking of the underwater robot. The uncertain parameters of the underwater robot are sampled by using a reduction method and are combined with reinforcement learning, so that the backward real-time optimal control of an underwater robot system becomes forward self-adaptive control, and the underwater robot can better complete track tracking.

Drawings

FIG. 1 is a flow chart of trajectory tracking according to the present invention.

Fig. 2 is a schematic diagram of the structure of the underwater mobile sensor network of the invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 2, the invention is provided with a buoy relay on the water surface, the buoy relay is used for self-positioning the underwater robot, and the control center gives the expected track information of the underwater robot and sends the expected track information to the underwater robot; and the underwater robot controller controls the driver to drive according to the system control to complete the motion of the underwater robot.

As shown in fig. 1, the present invention provides a reinforcement learning-based underwater robot control method, which comprises the following steps:

step one, the underwater robot is influenced by the surrounding environment underwater, uncertain factors in an underwater robot model need to be evaluated, and the control of an underwater robot controller to a driver can be completed; the underwater robot has six degrees of freedom, namely, up, down, left, right, front and back, the dynamic characteristics of the underwater robot can be described by two reference systems, namely a fixed reference system based on the expected track position of the robot and an inertial reference system based on uncertain factors of an underwater environment. The fixed reference system and the inertial reference system respectively consider the up-down direction, the left-right direction and the front-back direction, and the underwater inertial reference system uses factors such as underwater surging, swinging and heaving as uncertain factors to quote uncertain parameters.

In an inertial reference system, linear speeds in three directions of surging, swinging and heaving are pairwise perpendicular, and meanwhile, the influences of rolling, pitching and yawing on the angular speed of the underwater robot are considered in the direction of the linear speed.

Step two, due to the random influence of the underwater environment, respectively estimating uncertain parameters of the underwater robot in three directions: and selecting a group of sampling points for each parameter according to the respective probability density functions of the uncertain parameters, and respectively calculating by using the sampling points to reduce the order of the robot model, so that the output result of the controller can be obtained through calculation for a few times, the output mean value is ensured to be the same as the output mean value of the original model, the underwater robot adapts to the underwater environment, and more accurate control is realized.

The method comprises the following specific steps: for an inertial reference system, an output model of the robot mapped by the system based on uncertain factors is constructed in the front-back direction, the left-right direction and the up-down direction:

a_ithe uncertainty parameter is the ith uncertainty factor suffered by the underwater robot, and in the embodiment, factors such as underwater surge, swing and heave are used as uncertainty factors to quote uncertainty parameters.

Are coefficients. Each uncertainty parameter (or uncertainty factor) a_iAll follow independent probability density functions

wherein

Is a new coefficient of uncertain parameters in the low-order mapping. The order-reducing system maps the output mean value E '(G' (a) of the robot output model₁,a₂,a₃) And the output mean value E (G (a)) of the robot output model is mapped with the system₁,a₂,a₃) Are identical, i.e., E '(G' (a))₁,a₂,a₃))＝E(G(a₁,a₂,a₃))。

Step three, converting the real position of the underwater robot into the coordinates in the fixed reference system in the step 1, and acquiring model output mapped by the robot reduced-order system in the inertial reference system in the step 2;

step four, defining the self position of the underwater robot in different states k as

p(k)＝[x(k),y(k),z(k)]^T，

The desired trajectory to be tracked is:

p_r(k)＝[x_r(k),y_r(k),z_r(k)]^T。

in order to obtain an optimal control strategy, namely the action h performed by the underwater robot in each state when the underwater robot performs position tracking, a one-step cost function of the underwater robot in different states is set as

g_k(p,u)＝(x(k)-x_r(k))²+(y(k)-y_r(k))²+(z(k)-z_r(k))²+u²(k)

Wherein (x-x)_r)²+(y-y_r)²+(z-z_r)²Representing the cost of the tracking error, u is the underwater robot controller input, u²Representing a cost of consuming energy. By setting a one-step cost functionCalculating a cost function:

V(p(k))＝E_a(k){g_k(p,u)+γV(p(k+1))}

let V equal W^TΦ (p), the cost function can then be expressed as:

W_j+1Φ(p(k))＝E_a(k)[g_k(p,u)+γW_jΦ(p(k+1))]

in the formula (I), the compound is shown in the specification,

are basis vectors. W is the weight vector, solved iteratively by the least squares method.

After the value function is obtained in the fifth step, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is set to be U^Tσ (p), where the weight vector U is updated with a gradient descent method, σ (p) being the basis vector. The control strategy is improved by using a minimum cost function:

And step six, two processes of iterative value updating and strategy improvement are circulated, when the weight change obtained in each iterative value updating and strategy improvement process is smaller than a threshold value of 0.001, convergence is considered, h after iteration is finished is used as the output u of the controller and is input into a driver of the underwater robot, and the optimal control strategy under the current state is solved.

Step seven, inputting the optimal control strategy into the reduced-order system obtained in the step two, updating the state of the underwater robot,

repeating the fifth step and the sixth step again to obtain an optimal control strategy for the next action, and inputting the optimal control strategy into the second step again.

The invention also discloses a control method for tracking by using the underwater robot, which uses the track information generated by the continuous movement of the tracked object as the expected track information in the upper step 1, and controls the underwater robot by using the underwater robot control method based on reinforcement learning to realize the tracking of the tracked object.

The track information of the tracked object can be obtained by positioning the buoy relay.

An embodiment is specifically described below:

(1) as shown in figure 2, in a given water area with the length of 6m, the width of 5m and the depth of 1.5m, an underwater robot is deployed, a buoy relay is arranged on the water surface, the buoy relay is used for self-positioning the underwater robot, and a control center gives expected track information x of the underwater robot_r＝2sin(0.1k)，y_r＝0.1k，z_r1, where k ∈ [ 0.., 100s]And sent to the underwater robot.

(2) The underwater robot has a kinematic model S_k+1＝S_k+U_k+A_k，S_k＝[x(k),y(k),z(k)]^TIs the self position of the underwater robot, U_k＝[u_x,u_y,u_z]^TObtained by reinforcement learning, A_k＝[a₁(k),a₂(k),a₃(k)]^TIs not a definite parameter, wherein

-0.2≤a₁(k)≤0.3，

-0.8≤a₂(k)≤0.7，a₃(k)＝0。

(3) Tracking the position by a reinforcement learning method, and setting a one-step cost function V (p (k) ═ E_a(k){g_kIn (p, u) + γ V (p (k +1)) }, a discount factor γ is set to 0.9. To obtain the cost function, let V equal W^TPhi (p), the weight vector can be obtained by least squares iteration, the basisThe quantity phi (p) is [1, x, y, x²,y²,xy]^T. After obtaining the value function, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is equal to U^Tσ (p), where the weight vector U is updated by gradient descent, σ (p) [1, x, y ]]^T. The control strategy is improved when the minimum cost function is utilized.

(4) And (3) through two processes of cyclic iteration value updating and strategy improvement, when the weight change obtained in each iteration value updating and strategy improvement process is less than 0.001 of a threshold value, the convergence is regarded, h after the iteration is finished is used as the output u of the controller and is input into a driver of the underwater robot, and the solution of the optimal control strategy under the current state is finished.

(5) And (3) inputting the optimal control strategy as output into the system mapping robot output model in the step (2), and circulating the steps to realize the tracking task. The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications may be made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.

Claims

1. An underwater robot control method based on reinforcement learning is characterized by comprising the following steps:

in the formula, a_iIs an underwater machineThe i-th uncertainty factor experienced by the robot,

in the formula (I), the compound is shown in the specification,

is the coefficient of the uncertain factor in the low-order mapping;

p(k)＝[x(k),y(k),z(k)]^T

p_r(k)＝[x_r(k),y_r(k),z_r(k)]^T

g_k(p,u)＝(x(k)-x_r(k))²+(y(k)-y_r(k))²+(z(k)-z_r(k))²+u²(k)

V(p(k))＝E_a(k){g_k(p,u)+γV(p(k+1))}

W_j+1Φ(p(k))＝E_a(k)[g_k(p,u)+γW_jΦ(p(k+1))]

in the formula (I), the compound is shown in the specification,

is a basis vector, W is a weight vector;

and 7, inputting the real position in the step 3 into the step 4, obtaining the optimal control strategy of the next action through the operations in the steps 5-6, inputting the optimal control strategy as output into the system mapping robot output model in the step 2, and circularly repeating the operations in the steps 3 and 7 to complete the tracking task of the underwater robot.

2. The reinforcement learning-based underwater robot control method according to claim 1, wherein the uncertain factors in the step 1 are underwater surge, sway and heave.

3. The reinforcement learning-based underwater robot control method according to claim 1, wherein the reduced order system in the step 2 maps an output mean value E '(G' (a) of the robot output model₁,a₂,a₃) And the output mean value E (G (a)) of the robot output model is mapped with the system₁,a₂,a₃) Are identical).

4. The reinforcement learning-based underwater robot control method according to claim 1, wherein the specific steps of the step 4 are as follows:

V(p(k))＝E_a(k){g_k+γV(p(k+1))}

in the value updating process, let V equal W^TPhi (p), then valenceThe value function can be expressed as: w_j+1Φ(p(k))＝E_a(k)[g_k(p,u)+γW_jΦ(p(k+1))]

In the formula (I), the compound is shown in the specification,

is a base vector; w is a weight vector, and iterative solution is carried out by a least square method; the specific step of step 5 is to obtain the cost function, then, in the strategy improvement step, the optimal tracking control strategy is solved by using the method of setting the basis vector and the weight vector, and when solving, h (p) is equal to U^Tσ (p), wherein the weight vector U is updated by gradient descent, σ (p) being the basis vector; the control strategy is improved by using a minimum cost function:

5. The reinforcement learning-based underwater robot control method according to claim 1, wherein the details of the step 6 are as follows:

6. A control method for tracking by using an underwater robot is characterized in that a track of an object to be tracked moving underwater is used as an expected track of the underwater robot, and the underwater robot is controlled by using the underwater robot control method based on reinforcement learning according to claim 1, so that the object to be tracked is tracked.