CN113478486A - Robot motion parameter self-adaptive control method and system based on deep reinforcement learning - Google Patents

Robot motion parameter self-adaptive control method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN113478486A
CN113478486A CN202110786283.7A CN202110786283A CN113478486A CN 113478486 A CN113478486 A CN 113478486A CN 202110786283 A CN202110786283 A CN 202110786283A CN 113478486 A CN113478486 A CN 113478486A
Authority
CN
China
Prior art keywords
robot
neural network
value
controller
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110786283.7A
Other languages
Chinese (zh)
Other versions
CN113478486B (en
Inventor
任亮
王春雷
杨亚
邵海存
张志鹏
马保平
彭长武
李晓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Original Assignee
Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp filed Critical Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Priority to CN202110786283.7A priority Critical patent/CN113478486B/en
Publication of CN113478486A publication Critical patent/CN113478486A/en
Application granted granted Critical
Publication of CN113478486B publication Critical patent/CN113478486B/en
Priority to PCT/CN2022/104735 priority patent/WO2022223056A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The application provides a robot motion parameter self-adaptive control method and system based on deep reinforcement learning. The method comprises the following steps: building an agent in a simulation environment, the agent comprising: a strategy neural network, a value neural network and a task planning module; training a strategic neural network in the agent according to sample parameters based on guided reinforcement learning; based on layered reinforcement learning, sequentially and alternately carrying out strategy promotion and strategy evaluation on a strategy neural network and a value neural network in the intelligent agent according to a plurality of subtasks and reward functions corresponding to the subtasks to obtain a trained strategy neural network model; and outputting a control parameter optimization value to the controller according to the target task based on the trained strategy neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.

Description

Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
Technical Field
The application relates to the technical field of robot control, in particular to a robot motion parameter self-adaptive control method and system based on deep reinforcement learning.
Background
The control parameters play an important role in the motion performance of the quadruped robot system, and the parameter selection of the traditional controller depends on professional field knowledge and engineering experience. At present, some control methods based on deep reinforcement learning expect to realize end-to-end optimization from sensor data to motor control signals, but the technical route has long training period and difficult convergence, the stability and robustness of a control system cannot be ensured due to the inexplicability of a neural network, and if the performance of a training model is not good, only redesign and training are carried out, so that the engineering application of the deep reinforcement learning technology in robot motion control is greatly limited.
Therefore, there is a need to provide an improved solution to the above-mentioned deficiencies of the prior art.
Disclosure of Invention
The present application aims to provide a robot motion parameter adaptive control method and system based on deep reinforcement learning, so as to solve or alleviate the above problems in the prior art.
In order to achieve the above purpose, the present application provides the following technical solutions:
the application provides a robot motion parameter self-adaptive control method based on deep reinforcement learning, which comprises the following steps: step S101, constructing an intelligent agent in a simulation environment, wherein the intelligent agent comprises: a strategy neural network, a value neural network and a task planning module; step S102, based on the guided reinforcement learning, according to the sample parameters and the formula:
Figure BDA0003159300010000011
p=p0*0.99t+l*T
training a strategic neural network in the agent; wherein the sample parameters are control parameters of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing the number of trajectories for simulation training of the robot in the simulation environment, t being a time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is a preset initial value of the transition probability; step S103, based on layered reinforcement learning, sequentially and alternately performing strategy promotion and strategy evaluation on a strategy neural network and a value neural network in the agent according to a plurality of subtasks and corresponding reward functions thereof to obtain a trained strategy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks; and S104, outputting a control parameter optimization value to the controller according to the target task based on the trained strategy neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.
Preferably, in step S101, according to the first fitting function:
At=π(St)
building a strategic neural network of the agent in the simulation environment, wherein AtOptimizing the value, S, for a parameter of the controllertA state observation of the robot collected for a sensor of the robot.
Preferably, in step S101, according to the state observation value of the robot collected by the sensor of the robot and the parameter optimization value of the controller, according to a second fitting function:
Qt=Q(St,At)
building a value neural network of the agent in the simulation environment, wherein QtRepresenting an optimized value A of a controller parameter for the output of the strategic neural networktEvaluation of (1), StA state observation of the robot collected for a sensor of the robot.
Preferably, in step S101, in the simulation environment, according to a formula, based on the state observation value and the environmental return of the robot collected by the sensor of the robot:
Figure BDA0003159300010000021
Figure BDA0003159300010000022
Figure BDA0003159300010000023
Figure BDA0003159300010000024
building a task planning module of the intelligent agent; wherein R istValue of the reward function for a subtask, rtThe environmental reward comprises: a walking distance of the robot
Figure BDA0003159300010000031
Body stability of the robot
Figure BDA0003159300010000032
And energy consumption of said robot
Figure BDA0003159300010000033
Figure BDA0003159300010000034
The forward advancing distance of the robot along the x axis in the simulation environment;
Figure BDA0003159300010000035
rotating angles of the robot around coordinate axes x, y and z in the simulation environment;
Figure BDA0003159300010000036
and
Figure BDA0003159300010000037
is the offset of the robot with respect to the y-axis and z-axis,
Figure BDA00031593000100000314
is the torque of the motor of the robot,
Figure BDA00031593000100000315
is the motor speed of the robot, and delta t is a time period and represents the time taken by the robot to walk each step during simulation training; and alpha, beta and mu are weight coefficients determined according to the subtasks.
Preferably, in step S103, based on the hierarchical reinforcement learning, the agent follows the formula:
Figure BDA0003159300010000038
carrying out strategy promotion on the strategy neural network; and according to the formula:
Figure BDA0003159300010000039
performing strategy evaluation on the value neural network; wherein, thetaπWeights and biases representing the strategic neural network, L represents a total trajectory of the simulated trainingNumber, Q (S)t,At) Representing a value neural network of said agent, AtOptimizing the value, S, for a parameter of the controllertA state observation, pi (S), of the robot acquired for a sensor of the robott) Representing the output of the strategic neural network at time t; thetaQWeights and biases, R, representing the value neural networktThe reward function value of the corresponding task at the time t is represented, gamma is a discount factor, the value range of gamma is (0, 1), and Q ist+1Representing the output, Q, of the neural network at time t +1tRepresenting the output of the value neural network at time t.
Preferably, in step S103, the mission planning module of the agent, according to the formula:
Figure BDA00031593000100000310
Figure BDA00031593000100000311
Figure BDA00031593000100000312
Figure BDA00031593000100000313
judging the learning progress of the plurality of subtasks until the last subtask is completed to obtain the strategy neural network model; wherein ln、lm、liRespectively representing the nth, mth and ith training tracks, wherein n, m and i are positive integers,
Figure BDA0003159300010000041
denotes the lnThe t time step in the bar trace corresponds to the task award value,
Figure BDA0003159300010000042
it represents a variable of the boolean type,
Figure BDA0003159300010000043
when the value of (c) is true, it indicates that the robot has fallen; epsilon and delta are different preset thresholds.
Preferably, in step S104, according to the target task, according to the formula:
Figure BDA0003159300010000044
outputting a control parameter optimization value to the controller so that the controller controls the robot according to the control parameter optimization value; wherein, pi*Is composed of
Figure BDA0003159300010000045
Maximum-valued strategic neural network model, RtA reward function representing an output of the mission planning module.
The embodiment of the present application further provides a robot motion parameter adaptive control system based on deep reinforcement learning, including: an agent building unit configured to build an agent in a simulation environment, the agent comprising: a strategy neural network, a value neural network and a task planning module; a first learning unit configured to, based on guided reinforcement learning, according to the sample parameters, according to a formula according to the transition probability:
Figure BDA0003159300010000046
p=p0*0.99t+l*T
training a strategic neural network in the agent; wherein the sample parameters are control parameters of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing a trajectory for simulated training of the robot in the simulation environmentTrace number, t is the time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is the initial value of the transition probability; the second learning unit is configured to perform strategy promotion and strategy evaluation on the strategy neural network and the value neural network in the agent alternately in sequence according to a plurality of subtasks and reward functions corresponding to the subtasks based on layered reinforcement learning to obtain a trained strategy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks; and the optimization unit is configured to output a control parameter optimization value to the controller according to the target task based on the trained strategy neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.
Compared with the closest prior art, the technical scheme of the embodiment of the application has the following beneficial effects:
in the technical scheme provided by the embodiment of the application, an intelligent body comprising a strategy neural network, a value neural network and a task planning module is constructed in a simulation environment, control parameters of a robot controller are used as samples through guided reinforcement learning, a supervised learning label is provided for the output action of the strategy neural network of the intelligent body, the decision of the strategy neural network is effectively guided to avoid a known low-return space region, exploration time is reduced, the transition from supervised learning to autonomous learning is realized according to probability in the training process, and the generalization capability of the intelligent body can be effectively improved; through layered reinforcement learning, a plurality of subtasks are divided into a target task according to a task planning module, corresponding reward functions are constructed according to the subtasks, strategy promotion and strategy evaluation are respectively carried out on a strategy neural network and a value neural network in an intelligent body in a dynamic planning mode, the difficulty of each subtask is ensured to be matched with the decision-making capability of the intelligent body in a corresponding learning stage, an optimal strategy neural network is obtained, the robot can adaptively optimize controller parameters according to the environment condition of the robot and the state of the robot under the condition of no manual parameter adjustment, and the environment adaptability and the robustness of the robot are improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. Wherein:
fig. 1 is a schematic flowchart of a robot motion parameter adaptive control method based on deep reinforcement learning according to some embodiments of the present application;
FIG. 2 is a network architecture diagram of a strategic neural network provided in accordance with some embodiments of the present application;
FIG. 3 is a network architecture diagram of a value neural network provided in accordance with some embodiments of the present application;
FIG. 4 is a schematic diagram of a robot motion parameter adaptive control system based on deep reinforcement learning according to some embodiments of the present application;
fig. 5 is a schematic diagram of a system architecture for adaptive control of motion parameters of a quadruped robot based on deep reinforcement learning according to some embodiments of the present application.
Detailed Description
The present application will be described in detail below with reference to the embodiments with reference to the attached drawings. The various examples are provided by way of explanation of the application and are not limiting of the application. In fact, it will be apparent to those skilled in the art that modifications and variations can be made in the present application without departing from the scope or spirit of the application. For instance, features illustrated or described as part of one embodiment, can be used with another embodiment to yield a still further embodiment. It is therefore intended that the present application cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
Preferably, in the embodiment of the present application, the robot in the simulation environment refers to a simulation model of the robot, and a simulation model of a quadruped robot is used. The controller for controlling the motion of the quadruped robot adopts a layered control structure, and comprises a high-level leg track control, a gait control and a bottom-level leg control. The leg control aims to solve the problem of stability, the gait control aims to solve the problem of coordinated movement of four legs, and the leg track control aims to accurately model the interaction between the leg control and the ground. The parameter optimization values of the controller of the robot include: the step length, the step frequency, the leg raising height and the ground contact force of the robot.
Fig. 1 is a schematic flowchart of a robot motion parameter adaptive control method based on deep reinforcement learning according to some embodiments of the present application; as shown in fig. 1, the robot motion parameter adaptive control method based on deep reinforcement learning includes:
step S101, constructing an intelligent agent in a simulation environment, wherein the intelligent agent comprises: a strategy neural network, a value neural network and a task planning module;
in the embodiment of the application, the simulation models of the intelligent agent and the robot need to be constructed in the simulation environment, the simulation environment is built based on pybull, the simulation models of the robot can be conveniently loaded from URDF, SDF, MJCF and other file formats, and kinematics, dynamic simulation calculation, collision detection, interference query and the like of the robot can be provided.
In the embodiment of the application, an agent is constructed in a simulation environment, the robot control is trained, and after the agent finishes learning, the trained strategic neural network is deployed on the real robot. Therefore, the problem that the robot frequently falls down or the motor exceeds the limit position and the like caused by poor control effect on the robot at the early learning stage of the intelligent body is effectively avoided, and hardware damage of the robot is avoided.
In the embodiment of the application, in the constructed intelligent agent, the strategy neural network is used for outputting the optimized value of the control parameter of the controller of the robot, the value neural network is used for evaluating the output effect of the strategy neural network, the task planning module is used for decomposing the target task of the robot into a plurality of subtasks, and constructing the corresponding reward function according to each subtask to ensure that the difficulty of each subtask is matched with the decision-making capability of the intelligent agent in the corresponding learning stage.
In the embodiment of the application, the learning task of the robot is to realize stable walking on complex terrain and reduce energy consumption at the same time. The task planning module breaks the task into three subtasks: the first subtask is to realize that the robot walks for a sufficient distance to avoid falling or stepping in place, regardless of the quality of the finished action; the second subtask is to ensure the stability of the robot during movement and reduce the vibration and shake of the machine body; the third subtask aims at achieving the lowest energy consumption on the basis of the first two goals.
In some alternative embodiments, a strategic neural network of the agent is constructed in the simulation environment according to the first fitting function. Wherein the first fitting function is shown in formula (1), and formula (1) is as follows:
At=π(St)………………………………(1)
in the formula, AtOptimizing the value, S, for a parameter of the controllertA state observation of the robot collected for a sensor of the robot. Specifically, the state observation value of the robot is collected by an inertial measurement unit and a foot end pressure sensor of the robot, wherein the state observation value includes: leg phase, touchdown detection, quaternion, angular velocity, and linear acceleration of the robot.
In the embodiment of the application, the input layer of the strategic neural network is the state observed value S of the robott(ii) a The hidden layer of the strategy neural network has 4 layers, wherein the activation function of the hidden layer of the first 3 layers adopts Tanh (32), and the activation function of the hidden layer of the 4 th layer adopts Tanh (4); the output layer of the strategic neural network is a parameter optimization value A of the controllert(ii) a The network structure of the strategic neural network is shown in figure 2.
In some optional embodiments, a value neural network of the intelligent agent is built in the simulation environment according to the state observation value of the robot and the parameter optimization value of the controller, wherein the state observation value of the robot is collected by the sensor of the robot, and the parameter optimization value of the controller is collected by the sensor of the robot. Wherein the second fitting function is shown in formula (2), and formula (2) is as follows:
Qt=Q(St,At)………………………………(2)
in the formula, QtRepresenting an optimized value A of a controller parameter for the output of the strategic neural networktBy using a parameter-optimized value A for the characterization controllertGood or bad control effect when controlling the robot, StA state observation of the robot collected for a sensor of the robot.
In the embodiment of the application, the input layer of the value neural network is (S)t,At) (ii) a The value neural network comprises 2 hidden layers, and the activation functions of the 2 hidden layers all adopt Relu (32); optimizing value A for controller parameter by output layer of value neural networktEvaluation of (4); the network structure of the value neural network is shown in fig. 3.
In some optional embodiments, in the simulation environment, a task planning module of the agent is built according to formula (3) according to the state observation value and the environmental reward of the robot, which are collected by a sensor of the robot. Wherein, the formula (3) is as follows:
Figure BDA0003159300010000081
in the formula, RtValue of the reward function for a subtask, rtThe environmental reward comprises: a walking distance of the robot
Figure BDA0003159300010000082
Body stability of the robot
Figure BDA0003159300010000083
And energy consumption of said robot
Figure BDA0003159300010000084
Figure BDA0003159300010000085
The forward advancing distance of the robot along the x axis in the simulation environment;
Figure BDA0003159300010000086
rotating angles of the robot around coordinate axes x, y and z in the simulation environment;
Figure BDA0003159300010000087
and
Figure BDA0003159300010000088
is the offset of the robot with respect to the y-axis and z-axis,
Figure BDA0003159300010000089
is the torque of the motor of the robot,
Figure BDA00031593000100000810
is the motor speed of the robot, and delta t is a time period and represents the time taken by the robot to walk each step during simulation training; and alpha, beta and mu are weight coefficients determined according to the subtasks.
In the embodiment of the application, the task planning module decomposes the task into three subtasks, wherein alpha, beta and mu have different given weight combinations according to different subtask values. In the first subtask, the robot is realized to walk a sufficient distance to avoid a fall or a step in place, regardless of the quality of the motion completion, (α, β, μ) — (0.07, 0.05, 0.03); in the second subtask, the stability of the robot during movement is ensured, and the vibration and the shake of the machine body are reduced, (alpha, beta, mu) ═ 0.07, 0.09, 0.03); in the third subtask, the goal is to achieve the lowest energy consumption on the basis of the first two goals, (α, β, μ) ═ 0.07, 0.09, 0.05).
Step S102, training a strategic neural network in the intelligent agent according to a formula (4) and a formula (5) based on guided reinforcement learning and according to sample parameters; the formula (4) and the formula (5) are as follows:
Figure BDA00031593000100000811
p=p0*0.99t+l*r……………………(5)
wherein the sample parameter is a control parameter of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing the number of trajectories for simulation training of the robot in the simulation environment, t being a time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is a preset initial value of the transition probability. Here, it is to be noted that p0The larger the value of (A), the slower the process of the strategy neural network from supervised learning to autonomous learning.
In the embodiment of the application, through guided reinforcement learning, the control parameters of the robot controller are used as samples, a label for supervised learning is provided for the output action of the strategy neural network of the agent, the decision of the strategy neural network is effectively guided to avoid the known low-return spatial region, the exploration time is reduced, the transition from the supervised learning to the autonomous learning is realized according to the probability in the training process, and the generalization capability of the agent can be effectively improved.
Step S103, based on layered reinforcement learning, sequentially and alternately performing strategy promotion and strategy evaluation on a strategy neural network and a value neural network in the agent according to a plurality of subtasks and corresponding reward functions thereof to obtain a trained strategy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks;
in the embodiment of the application, through layered reinforcement learning, a plurality of subtasks are divided into a target task according to a task planning module, corresponding reward functions are constructed according to the subtasks, policy promotion and policy evaluation are respectively carried out on a policy neural network and a value neural network in an intelligent body in a dynamic planning mode, the difficulty of each subtask is ensured to be matched with the decision-making capability of the intelligent body in a corresponding learning stage, an optimal policy neural network is obtained, the robot can adaptively optimize controller parameters according to the environment condition of the robot and the state of the robot under the condition of no manual parameter adjustment, and the environment adaptability and the robustness of the robot are improved.
In some optional embodiments, based on hierarchical reinforcement learning, the agent performs policy boosting on the policy neural network according to equation (6); and performing strategy evaluation on the value neural network according to a formula (7). Equation (6) is as follows:
Figure BDA0003159300010000091
equation (7) is as follows:
Figure BDA0003159300010000092
in the formula, thetaπRepresenting weights and biases of the strategic neural network, L representing the total number of trajectories of the simulated training, Q (S)t,At) Representing a value neural network of said agent, AtOptimizing the value, S, for a parameter of the controllertA state observation, pi (S), of the robot acquired for a sensor of the robott) Representing the output of the strategic neural network at time t; thetaQWeights and biases, R, representing the value neural networktThe reward function value of the corresponding task at the time t is represented, gamma is a discount factor, the value range of gamma is (0, 1), and Q ist+1Representing the output, Q, of the neural network at time t +1tRepresenting the output of the value neural network at time t.
In a specific example, the task planning module of the agent determines the learning progress of the plurality of subtasks according to a formula (8) until the last subtask is completed, so as to obtain the strategic neural network model. Wherein, the formula (8) is as follows:
Figure BDA0003159300010000101
in the formula In、lm、liRespectively representing the nth, mth and ith training tracks, wherein n, m and i are positive integers,
Figure BDA0003159300010000102
denotes the lnThe t time step in the bar trace corresponds to the task award value,
Figure BDA0003159300010000103
it represents a variable of the boolean type,
Figure BDA0003159300010000104
when the value of (c) is true, it indicates that the robot has fallen; epsilon and delta are different preset thresholds.
Further, according to the target task, a control parameter optimization value is output to the controller according to a formula (9), so that the robot is controlled by the controller according to the control parameter optimization value; wherein, the formula (9) is as follows:
Figure BDA0003159300010000105
in the formula, pi*Is composed of
Figure BDA0003159300010000106
The strategic neural network model taking the maximum value, i.e. the trained, optimal strategic neural network model, RtA reward function representing an output of the mission planning module.
In the embodiment of the application, an agent comprising a strategy neural network, a value neural network and a task planning module is constructed in a simulation environment, control parameters of a robot controller are used as samples through guided reinforcement learning, a label of supervised learning is provided for output actions of the strategy neural network of the agent, decisions of the strategy neural network are effectively guided to avoid a known low-return space region, exploration time is reduced, transition from the supervised learning to autonomous learning is realized according to probability in a training process, and the generalization capability of the agent can be effectively improved; through layered reinforcement learning, a plurality of subtasks are divided into a target task according to a task planning module, corresponding reward functions are constructed according to the subtasks, strategy promotion and strategy evaluation are respectively carried out on a strategy neural network and a value neural network in an intelligent body in a dynamic planning mode, the difficulty of each subtask is ensured to be matched with the decision-making capability of the intelligent body in a corresponding learning stage, an optimal strategy neural network is obtained, the robot can adaptively optimize controller parameters according to the environment condition of the robot and the state of the robot under the condition of no manual parameter adjustment, and the environment adaptability and the robustness of the robot are improved.
Exemplary System
FIG. 4 is a schematic mechanism diagram of a robot motion parameter adaptive control system based on deep reinforcement learning according to some embodiments of the present application; as shown in fig. 4, the robot motion parameter adaptive control system based on deep reinforcement learning includes: an agent building unit 401, a first learning unit 402, a second learning unit 403 and an optimization unit 404.
Agent building unit 401 is configured to build agents in a simulation environment, the agents including: a strategic neural network, a value neural network and a mission planning module.
The first learning unit 402 is configured to, based on guided reinforcement learning, according to the sample parameters, according to the formula:
Figure BDA0003159300010000111
p=p0*0.99t+l*T
training a strategic neural network in the agent; wherein the sample parameters are control parameters of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing the number of trajectories for simulation training of the robot in the simulation environment, t being a time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is the initial value of the transition probability.
The second learning unit 403 is configured to perform policy promotion and policy evaluation on the policy neural network and the value neural network in the agent in turn and alternately according to a plurality of subtasks and reward functions corresponding to the subtasks based on hierarchical reinforcement learning, so as to obtain a trained policy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks;
the optimization unit 404 is configured to output a control parameter optimization value to the controller according to the target task based on the trained strategic neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.
In some optional embodiments, the agent building unit 401 is further configured to, according to the first fitting function:
At=π(St)
building a strategic neural network of the agent in the simulation environment, wherein AtOptimizing the value, S, for a parameter of the controllertA state observation of the robot collected for a sensor of the robot.
In some optional embodiments, the state observation of the robot is collected by an inertial measurement unit and a foot end pressure sensor of the robot, wherein the state observation comprises: leg phase, touchdown detection, quaternion, angular velocity, and linear acceleration of the robot.
In some optional embodiments, the agent building unit 401 is further configured to, according to the state observed values of the robot and the parameter optimized values of the controller collected by the sensors of the robot, according to a second fitting function:
Qt=Q(St,At)
building a value neural network of the agent in the simulation environment, wherein QtRepresenting an optimized value A of a controller parameter for the output of the strategic neural networktEvaluation of (1), StA state observation of the robot collected for a sensor of the robot.
In some optional embodiments, the agent building unit 401 is further configured to, in the simulation environment, according to the state observation and the environmental reward of the robot collected by the sensor of the robot, according to a formula:
Figure BDA0003159300010000121
Figure BDA0003159300010000122
Figure BDA0003159300010000123
Figure BDA0003159300010000124
building a task planning module of the intelligent agent;
wherein R istValue of the reward function for a subtask, rtReturning to the environmentThe reporting includes: a walking distance of the robot
Figure BDA0003159300010000131
Body stability of the robot
Figure BDA0003159300010000132
And energy consumption of said robot
Figure BDA0003159300010000133
Figure BDA0003159300010000134
The forward advancing distance of the robot along the x axis in the simulation environment;
Figure BDA0003159300010000135
rotating angles of the robot around coordinate axes x, y and z in the simulation environment;
Figure BDA0003159300010000136
and
Figure BDA0003159300010000137
is the offset of the robot with respect to the y-axis and z-axis,
Figure BDA0003159300010000138
is the torque of the motor of the robot,
Figure BDA0003159300010000139
is the motor speed of the robot, and delta t is a time period and represents the time taken by the robot to walk each step during simulation training; and alpha, beta and mu are weight coefficients determined according to the subtasks.
In some optional embodiments, the second learning unit 403 is further configured to perform a hierarchical reinforcement learning based on the agent:
Figure BDA00031593000100001310
carrying out strategy promotion on the strategy neural network;
and according to the formula:
Figure BDA00031593000100001311
performing strategy evaluation on the value neural network;
wherein, thetaπRepresenting weights and biases of the strategic neural network, L representing the total number of trajectories of the simulated training, Q (S)t,At) Representing a value neural network of said agent, AtOptimizing the value, S, for a parameter of the controllertA state observation, pi (S), of the robot acquired for a sensor of the robott) Representing the output of the strategic neural network at time t;
θQweights and biases, R, representing the value neural networktThe reward function value of the corresponding task at the time t is represented, gamma is a discount factor, the value range of gamma is (0, 1), and Q ist+1Representing the output, Q, of the neural network at time t +1tRepresenting the output of the value neural network at time t.
In some optional embodiments, the second learning unit 403 is further configured as a mission planning module of the agent, according to the formula:
Figure BDA00031593000100001312
Figure BDA00031593000100001313
Figure BDA0003159300010000141
Figure BDA0003159300010000142
judging the learning progress of the plurality of subtasks until the last subtask is completed to obtain the strategy neural network model;
wherein ln、lm、liRespectively representing the nth, mth and ith training tracks, wherein n, m and i are positive integers,
Figure BDA0003159300010000143
denotes the lnThe t time step in the bar trace corresponds to the task award value,
Figure BDA0003159300010000144
it represents a variable of the boolean type,
Figure BDA0003159300010000145
when the value of (c) is true, it indicates that the robot has fallen; epsilon and delta are different preset thresholds.
In some optional embodiments, the optimization unit 404 is further configured to, based on the target task, according to the formula:
Figure BDA0003159300010000146
outputting a control parameter optimization value to the controller so that the controller controls the robot according to the control parameter optimization value; wherein, pi*Is composed of
Figure BDA0003159300010000147
Maximum-valued strategic neural network model, RtA reward function representing an output of the mission planning module.
In some optional embodiments, the parameter optimization values of the controller include: the step length, the step frequency, the leg raising height and the ground contact force of the robot.
FIG. 5 is a schematic diagram of a system architecture for adaptive control of motion parameters of a quadruped robot based on deep reinforcement learning according to some embodiments of the present application; as shown in fig. 5, in the architecture diagram of the adaptive control system for motion parameters of the quadruped robot based on deep reinforcement learning, a controller for controlling the motion of the quadruped robot adopts a layered control architecture, which includes a high-level leg trajectory control, a gait control and a bottom-level leg control, wherein the leg control is used for solving the stability problem, the gait control is used for solving the coordinated motion problem of four legs of the robot, and the interaction between the robot and the ground is accurately modeled during the leg trajectory control.
The robot motion parameter adaptive control system based on deep reinforcement learning provided by the embodiment of the application can realize the processes and steps of any robot motion parameter adaptive control method embodiment based on deep reinforcement learning, and achieve the same technical effect, and is not repeated here.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. A robot motion parameter self-adaptive control method based on deep reinforcement learning is characterized by comprising the following steps:
step S101, constructing an intelligent agent in a simulation environment, wherein the intelligent agent comprises: a strategy neural network, a value neural network and a task planning module;
step S102, based on the guided reinforcement learning, according to the sample parameters and the formula:
Figure FDA0003159299000000011
p=p0*0.99t+l*T
training a strategic neural network in the agent;
wherein the sample parameters are control parameters of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing the number of trajectories for simulation training of the robot in the simulation environment, t being a time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is a preset initial value of the transition probability;
step S103, based on layered reinforcement learning, sequentially and alternately performing strategy promotion and strategy evaluation on a strategy neural network and a value neural network in the agent according to a plurality of subtasks and corresponding reward functions thereof to obtain a trained strategy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks;
and S104, outputting a control parameter optimization value to the controller according to the target task based on the trained strategy neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.
2. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 1, wherein in step S101,
according to the first fitting function:
At=π(St)
building a strategic neural network of the agent in the simulation environment, wherein AtOptimizing the value, S, for a parameter of the controllertA state observation of the robot collected for a sensor of the robot.
3. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 1, wherein in step S101,
according to the state observation value of the robot and the parameter optimization value of the controller, which are acquired by the sensor of the robot, according to a second fitting function:
Qt=Q(St,At)
building a value neural network of the agent in the simulation environment, wherein QtRepresenting an optimized value A of a controller parameter for the output of the strategic neural networktEvaluation of (1), StA state observation of the robot collected for a sensor of the robot.
4. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 1, wherein in step S101,
in the simulation environment, according to the state observation value and the environment return of the robot, which are acquired by a sensor of the robot, according to a formula:
Figure FDA0003159299000000021
Figure FDA0003159299000000022
Figure FDA0003159299000000023
Figure FDA0003159299000000024
building a task planning module of the intelligent agent;
wherein R istBeing subtasksValue of the reward function, rtThe environmental reward comprises: a walking distance of the robot
Figure FDA0003159299000000025
Body stability of the robot
Figure FDA0003159299000000026
And energy consumption of said robot
Figure FDA0003159299000000027
Figure FDA0003159299000000028
The forward advancing distance of the robot along the x axis in the simulation environment;
Figure FDA0003159299000000029
rotating angles of the robot around coordinate axes x, y and z in the simulation environment;
Figure FDA00031592990000000210
and
Figure FDA00031592990000000211
is the offset of the robot with respect to the y-axis and z-axis,
Figure FDA00031592990000000212
is the torque of the motor of the robot,
Figure FDA0003159299000000031
is the motor speed of the robot, and delta t is a time period and represents the time taken by the robot to walk each step during simulation training; and alpha, beta and mu are weight coefficients determined according to the subtasks.
5. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 1, wherein in step S103,
based on layered reinforcement learning, the agent follows the formula:
Figure FDA0003159299000000032
carrying out strategy promotion on the strategy neural network;
and according to the formula:
Figure FDA0003159299000000033
performing strategy evaluation on the value neural network;
wherein, thetaπRepresenting weights and biases of the strategic neural network, L representing the total number of trajectories of the simulated training, Q (S)t,At) Representing a value neural network of said agent, AtOptimizing the value, S, for a parameter of the controllertA state observation, pi (S), of the robot acquired for a sensor of the robott) Representing the output of the strategic neural network at time t;
θQweights and biases, R, representing the value neural networktThe reward function value of the corresponding task at the time t is represented, gamma is a discount factor, the value range of gamma is (0, 1), and Q ist+1Representing the output, Q, of the neural network at time t +1tRepresenting the output of the value neural network at time t.
6. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 5, wherein in step S103,
the task planning module of the intelligent agent is used for planning tasks according to a formula:
Figure FDA0003159299000000034
Figure FDA0003159299000000035
Figure FDA0003159299000000041
Figure FDA0003159299000000042
judging the learning progress of the plurality of subtasks until the last subtask is completed to obtain the strategy neural network model;
wherein ln、lm、liRespectively representing the nth, mth and ith training tracks, wherein n, m and i are positive integers,
Figure FDA0003159299000000043
denotes the lnThe t time step in the bar trace corresponds to the task award value,
Figure FDA0003159299000000044
it represents a variable of the boolean type,
Figure FDA0003159299000000045
when the value of (c) is true, it indicates that the robot has fallen; epsilon and delta are different preset thresholds.
7. The adaptive control method for motion parameters of a robot based on deep reinforcement learning of claim 6, wherein in step S104,
according to the target task, according to a formula:
Figure FDA0003159299000000046
outputting a control parameter optimization value to the controller so that the controller controls the robot according to the control parameter optimization value;
wherein, pi*Is composed of
Figure FDA0003159299000000047
Maximum-valued strategic neural network model, RtA reward function representing an output of the mission planning module.
8. A robot motion parameter adaptive control system based on deep reinforcement learning is characterized by comprising:
an agent building unit configured to build an agent in a simulation environment, the agent comprising: a strategy neural network, a value neural network and a task planning module;
a first learning unit configured to, based on guided reinforcement learning, according to the sample parameters, according to a formula according to the transition probability:
Figure FDA0003159299000000051
p=p0*0.99t+l*T
training a strategic neural network in the agent;
wherein the sample parameters are control parameters of a controller of the robot; a. thel,tRepresenting control parameters to be optimized in the controller, l representing the number of trajectories for simulation training of the robot in the simulation environment, t being a time step of the simulation training, controller (S)l,t) Represents the output of the controller of the robot, pi (S)l,t) Representing an output of the strategic neural network, p representing a transition probability of the strategic neural network transitioning from supervised learning to autonomous learning, p0Is the initial value of the transition probability;
the second learning unit is configured to perform strategy promotion and strategy evaluation on the strategy neural network and the value neural network in the agent alternately in sequence according to a plurality of subtasks and reward functions corresponding to the subtasks based on layered reinforcement learning to obtain a trained strategy neural network model; the plurality of subtasks are obtained by decomposing a target task of the robot through the task planning module, and the reward function is constructed by the task planning module according to the subtasks;
and the optimization unit is configured to output a control parameter optimization value to the controller according to the target task based on the trained strategy neural network model, so that the robot is controlled by the controller according to the control parameter optimization value.
CN202110786283.7A 2021-07-12 2021-07-12 Robot motion parameter self-adaptive control method and system based on deep reinforcement learning Active CN113478486B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110786283.7A CN113478486B (en) 2021-07-12 2021-07-12 Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
PCT/CN2022/104735 WO2022223056A1 (en) 2021-07-12 2022-07-08 Robot motion parameter adaptive control method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110786283.7A CN113478486B (en) 2021-07-12 2021-07-12 Robot motion parameter self-adaptive control method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113478486A true CN113478486A (en) 2021-10-08
CN113478486B CN113478486B (en) 2022-05-17

Family

ID=77938821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110786283.7A Active CN113478486B (en) 2021-07-12 2021-07-12 Robot motion parameter self-adaptive control method and system based on deep reinforcement learning

Country Status (2)

Country Link
CN (1) CN113478486B (en)
WO (1) WO2022223056A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167136A (en) * 2022-07-21 2022-10-11 中国人民解放军国防科技大学 Intelligent agent control method based on deep reinforcement learning and conditional entropy bottleneck
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
WO2022223056A1 (en) * 2021-07-12 2022-10-27 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot motion parameter adaptive control method and system based on deep reinforcement learning
CN115533905A (en) * 2022-10-09 2022-12-30 清华大学 Virtual and real transfer learning method and device of robot operation technology and storage medium
CN116713999A (en) * 2023-08-07 2023-09-08 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116834018A (en) * 2023-08-07 2023-10-03 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN118070840A (en) * 2024-04-19 2024-05-24 中国海洋大学 Multi-foot robot static standing posture analysis method, system and application

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117086866B (en) * 2023-08-07 2024-04-12 广州中鸣数码科技有限公司 Task planning training method and device based on programming robot

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0901053A1 (en) * 1997-09-04 1999-03-10 Rijksuniversiteit te Groningen Method for modelling and/or controlling a production process using a neural network and controller for a production process
US9008840B1 (en) * 2013-04-19 2015-04-14 Brain Corporation Apparatus and methods for reinforcement-guided supervised learning
CN109693239A (en) * 2018-12-29 2019-04-30 深圳市越疆科技有限公司 A kind of robot grasping means based on deeply study
CN111176122A (en) * 2020-02-11 2020-05-19 哈尔滨工程大学 Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111421538A (en) * 2020-03-31 2020-07-17 西安交通大学 Depth reinforcement learning robot control method based on priority experience playback
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning
US20210086355A1 (en) * 2019-09-19 2021-03-25 Lg Electronics Inc. Control server and method for controlling robot using artificial neural network, and robot implementing the same
US20210089891A1 (en) * 2019-09-24 2021-03-25 Hrl Laboratories, Llc Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network
CN112621714A (en) * 2020-12-02 2021-04-09 上海微电机研究所(中国电子科技集团公司第二十一研究所) Upper limb exoskeleton robot control method and device based on LSTM neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11429854B2 (en) * 2016-12-04 2022-08-30 Technion Research & Development Foundation Limited Method and device for a computerized mechanical device
US10786900B1 (en) * 2018-09-27 2020-09-29 Deepmind Technologies Limited Robot control policy determination through constrained optimization for smooth continuous control
CN110861084B (en) * 2019-11-18 2022-04-05 东南大学 Four-legged robot falling self-resetting control method based on deep reinforcement learning
CN111208822A (en) * 2020-02-17 2020-05-29 清华大学深圳国际研究生院 Quadruped robot gait control method based on reinforcement learning and CPG controller
CN111580385A (en) * 2020-05-11 2020-08-25 深圳阿米嘎嘎科技有限公司 Robot walking control method, system and medium based on deep reinforcement learning
CN113478486B (en) * 2021-07-12 2022-05-17 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot motion parameter self-adaptive control method and system based on deep reinforcement learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0901053A1 (en) * 1997-09-04 1999-03-10 Rijksuniversiteit te Groningen Method for modelling and/or controlling a production process using a neural network and controller for a production process
US9008840B1 (en) * 2013-04-19 2015-04-14 Brain Corporation Apparatus and methods for reinforcement-guided supervised learning
CN109693239A (en) * 2018-12-29 2019-04-30 深圳市越疆科技有限公司 A kind of robot grasping means based on deeply study
US20210086355A1 (en) * 2019-09-19 2021-03-25 Lg Electronics Inc. Control server and method for controlling robot using artificial neural network, and robot implementing the same
US20210089891A1 (en) * 2019-09-24 2021-03-25 Hrl Laboratories, Llc Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network
CN111176122A (en) * 2020-02-11 2020-05-19 哈尔滨工程大学 Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111421538A (en) * 2020-03-31 2020-07-17 西安交通大学 Depth reinforcement learning robot control method based on priority experience playback
CN111638646A (en) * 2020-05-29 2020-09-08 平安科技(深圳)有限公司 Four-legged robot walking controller training method and device, terminal and storage medium
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning
CN112621714A (en) * 2020-12-02 2021-04-09 上海微电机研究所(中国电子科技集团公司第二十一研究所) Upper limb exoskeleton robot control method and device based on LSTM neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MA, JC等: "Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning", 《JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022223056A1 (en) * 2021-07-12 2022-10-27 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot motion parameter adaptive control method and system based on deep reinforcement learning
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
CN115238599B (en) * 2022-06-20 2024-02-27 中国电信股份有限公司 Energy-saving method and model reinforcement learning training method and device for refrigerating system
CN115167136A (en) * 2022-07-21 2022-10-11 中国人民解放军国防科技大学 Intelligent agent control method based on deep reinforcement learning and conditional entropy bottleneck
CN115533905A (en) * 2022-10-09 2022-12-30 清华大学 Virtual and real transfer learning method and device of robot operation technology and storage medium
CN115533905B (en) * 2022-10-09 2024-06-04 清华大学 Virtual-real transfer learning method and device for robot operation skills and storage medium
CN116713999A (en) * 2023-08-07 2023-09-08 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116834018A (en) * 2023-08-07 2023-10-03 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116713999B (en) * 2023-08-07 2023-10-20 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN118070840A (en) * 2024-04-19 2024-05-24 中国海洋大学 Multi-foot robot static standing posture analysis method, system and application

Also Published As

Publication number Publication date
CN113478486B (en) 2022-05-17
WO2022223056A1 (en) 2022-10-27

Similar Documents

Publication Publication Date Title
CN113478486A (en) Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
JP4587738B2 (en) Robot apparatus and robot posture control method
US8996177B2 (en) Robotic training apparatus and methods
CN101916071B (en) CPG feedback control method of biomimetic robot fish movement
US20170001309A1 (en) Robotic training apparatus and methods
CN113821045B (en) Reinforced learning action generating system of leg-foot robot
Wu et al. Neurally controlled steering for collision-free behavior of a snake robot
CN113093779B (en) Robot motion control method and system based on deep reinforcement learning
CN111783994A (en) Training method and device for reinforcement learning
CN114047697B (en) Four-foot robot balance inverted pendulum control method based on deep reinforcement learning
CN114563954A (en) Quadruped robot motion control method based on reinforcement learning and position increment
Tieck et al. Generating pointing motions for a humanoid robot by combining motor primitives
CN117555339B (en) Strategy network training method and human-shaped biped robot gait control method
Hu et al. Hybrid learning architecture for fuzzy control of quadruped walking robots
CN116062059B (en) Single-leg robot continuous jump control method based on deep reinforcement learning
Jiang et al. Stable skill improvement of quadruped robot based on privileged information and curriculum guidance
CN117572877B (en) Biped robot gait control method, biped robot gait control device, storage medium and equipment
CN117140527B (en) Mechanical arm control method and system based on deep reinforcement learning algorithm
CN117311271A (en) Depth reinforcement learning robot motion control method and system based on priori knowledge
CN114771783B (en) Control method and system for submarine stratum space robot
Zhu et al. Learning of Quadruped Robot Motor Skills Based on Policy Constrained TD3
CN117518821A (en) Fault feature extraction-based spine quadruped robot fault-tolerant gait control method
Imaduddin et al. Intelligent Biped Robot Simulation Locomotion using Deep Q Network
Zhou et al. The path trajectory planning of swinging legs for humanoid robot
CN117850240A (en) Brain-computer sharing control method and system for air-ground cooperative unmanned system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant