CN114415507B - Deep neural network-based smart hand-held process dynamics model building and training method - Google Patents

Deep neural network-based smart hand-held process dynamics model building and training method Download PDF

Info

Publication number
CN114415507B
CN114415507B CN202210017100.XA CN202210017100A CN114415507B CN 114415507 B CN114415507 B CN 114415507B CN 202210017100 A CN202210017100 A CN 202210017100A CN 114415507 B CN114415507 B CN 114415507B
Authority
CN
China
Prior art keywords
state
model
training
data
smart
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210017100.XA
Other languages
Chinese (zh)
Other versions
CN114415507A (en
Inventor
周锦瑜
盛士能
王壮
祝雯豪
俞冰清
鲍官军
胥芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210017100.XA priority Critical patent/CN114415507B/en
Publication of CN114415507A publication Critical patent/CN114415507A/en
Application granted granted Critical
Publication of CN114415507B publication Critical patent/CN114415507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a deep neural network-based smart hand-held process dynamics model building and training method, which comprises the following steps: dexterous hand using SAC algorithm strategyInteracting and training with the object to be gripped in the environment, collecting system state transition data and placing the system state transition data into a sample buffer area; step 2: setting the category number of fuzzy clustering and carrying out fuzzy clustering on state transition data in a sample buffer area; step 3: building sub-network model containing state increment direction probabilityState increment amplification subnetwork modelA smart manual mechanical model f; step 4: generating sample sampling probability according to membership degree for each class of fuzzy clustering, and sampling to obtain training samples; step 5: and training a smart manual mechanical model, and predicting the state of the environmental system. The method improves the accuracy of the dynamic model, reduces the local overfitting phenomenon in the dynamic model training process, reduces the prediction error of the dynamic model and improves the stability.

Description

Deep neural network-based smart hand-held process dynamics model building and training method
Technical Field
The invention belongs to the field of intensive learning control of smart hands, and particularly relates to a dynamic model building and training method for a capturing process of a smart hand based on a deep neural network.
Background
Due to the high degree of freedom of the dexterous hand, how to improve the control effect of the dexterous hand reinforcement learning control algorithm and the utilization rate of training sample data becomes a difficulty in the field of the dexterous hand reinforcement learning control algorithm. Currently, reinforcement learning control algorithms can be classified into model-free reinforcement learning algorithms and model-based reinforcement learning algorithms according to whether an Agent (Agent) understands an environment and its own dynamic model. Through the reinforcement learning algorithm, the intelligent agent can conduct interaction trial and error from the same environment aiming at a specific task, and acquire environmental feedback rewards in the process, so that the behavior of the intelligent agent is changed, and the environmental feedback rewards are maximized in the next interaction process with the environment.
Model-free reinforcement learning control algorithms currently applicable to the mainstream of smart hand control include: DEEP DETERMINISTIC Policy Gradient (DDPG), soft Actor Critic (SAC), proximal Policy Optimization (PPO) and the like, the model-free reinforcement learning control algorithm has better control performance, but has low sample utilization rate on data, and a large number of samples need to be collected, which requires a large time cost and is often difficult to apply in reality. While the model-based reinforcement learning control algorithm has the advantage of utilizing the sample data. The current mainstream model-based reinforcement learning control algorithms include: alphaZero, imagination-Augmented Agents (I2A), MBMF, and the like. These algorithms either require domain experts to provide a model of system dynamics based on expertise or require learning system dynamics from the interactive process of the environment. However, a system composed of a smart hand and a gripped object has a large number of collision scenes, and it is difficult to give a system dynamics model in advance, and it is necessary to obtain an approximate system dynamics model by supervised learning. Examples of the method of supervised learning dynamics model are: sparse Identification of Nonlinear Dynamics (SINDy), kinetic parameter identification, neural network fitting kinetic models, and the like. SINDy and related methods require a set of functional dictionaries, and thus have the disadvantage of great difficulty in application. Kinetic parameter identification requires advance of a system kinetic model framework and is not applicable to a system consisting of a smart hand rich in collision and a gripped object. The existing method for fitting the dynamic model by using the neural network has the problems of poor stability, easiness in generating over-fitting phenomenon and the like.
Disclosure of Invention
The invention aims to provide a dynamic model building and training method in a flexible hand-held process based on a deep neural network, which aims to solve the technical problems that a dynamic model cannot be given by a current system, the difficulty of giving a functional dictionary set is high, the stability of a neural network fitting dynamic model is poor, and the fitting phenomenon is easy to generate.
In order to solve the technical problems, the specific technical scheme of the deep neural network-based smart hand-held process dynamics model building and training method is as follows:
A dynamic model building and training method for a flexible hand-held process based on a deep neural network comprises the following steps:
step 1: the smart hand uses SAC algorithm strategy pi θ to interact with the object to be grasped and train in the environment, collects system state transfer data and puts the system state transfer data into a sample buffer area;
step 2: setting the category number of fuzzy clustering and carrying out fuzzy clustering on state transition data in a sample buffer area;
Step 3: building a smart manual mechanical model f comprising a state increment direction probability sub-network model f d and a state increment amplification sub-network model f a;
step 4: generating sample sampling probability according to membership degree for each class of fuzzy clustering, and sampling to obtain training samples;
Step 5: and training a smart manual mechanical model, and predicting the state of the environmental system.
Further, step 1 uses MuJoCo physical simulation engine to simulate the gripping process of the smart hand and the gripped object, the simulation environment continuously generates gaussian external force and torque noise to be applied to the mass center of the gripped object and the torque of the joints of the smart hand so as to simulate random external force interference in a real scene; with the lapse of time in the simulator, the states of the smart hand and the gripped object change, the whole process accords with a Markov decision process, and the five-tuple < S, A, P, R and gamma > is used for representing a system state space formed by the smart hand and the gripped object, wherein S represents a smart hand joint action space, P represents a state transition probability, R represents a rewarding space, and gamma represents a rewarding discount coefficient.
Further, step 1 uses an Actor network without a model reinforcement learning algorithm SAC as a smart hand control strategy pi θ, sets a system target g as a gripped object to a random orientation, considers that the simulation ends and resets a simulation environment if the gripped object falls, records state transition Data (s, a, s ', r) of the smart hand and the gripped object in the simulator, wherein s is a current time system state, a is a current time system input action, s' is a next time system state, r is a reward value calculated according to the gripped object, and stores the state transition Data to obtain a Data set Data:
Data={(s1,a1,s2,r1),(s2,a2,s3,r2),...,(sn-1,an-1,sn,rn-1)};
and training the Actor and Critic networks using Data.
Further, the step 2 comprises the following specific steps:
Carrying out fuzzy clustering on the Data set Data, and randomly setting a fuzzy clustering center set C= { C 1,c2,...,ck }, wherein the clustering center C contains the same number of elements as the system state s; calculating the Euclidean distance d between each state s and each clustering center c in the Data set Data to obtain a distance matrix Wherein d ij=‖si-cj II represents the Euclidean distance value between the ith state and the jth cluster center; adjusting the fuzzy clustering center set C to enable the sum of squares of all elements of the distance matrix D t to be minimum; calculating the membership degree u of the state s in the Data set Data to the clustering category to obtain a membership degree matrix/>Wherein the method comprises the steps ofRepresenting the membership of the ith state to the jth cluster category.
Further, the step 3 comprises the following specific steps:
Setting up a state increment direction probability sub-network model f d by using Pytorch depth neural network frames, wherein the inputs of the state increment direction probability sub-network model f a;fd and f a comprise states s of a dexterous hand and a gripped object system and dexterous hand joint input actions a, the state increment direction probability sub-network model is composed of three linear layers, two ReLu layers and two positive and negative polarity channel layers, and a Sigmoid layer is additionally arranged at the tail part of the network f d; the outputs of f d and f a are the direction and absolute value of the system state change Δs, respectively.
Further, the step 4 comprises the following specific steps:
sampling a dynamic model training sample for each cluster category; calculating the sampling probability p of the Data set Data state s in each category according to the membership matrix U to obtain a probability matrix Wherein the method comprises the steps ofRepresenting the probability that the ith state is sampled in the jth cluster class, if state s i is taken (s i,ai,s′i) is taken as a training sample.
Further, the step 5 comprises the following specific steps:
training f d, and setting a loss function as follows:
Jtrand(α)=E(s,a,s′)~Date(P)[(fd(s,a)-g(s′-s))2]+0.0005‖α‖2
Wherein the method comprises the steps of Alpha is f d all parameters;
Using a gradient descent method, the optimizer uses Adam;
Training f a, and setting a loss function as follows:
Jtrana(β)=E(s,a,s′)~Date(P)[(fa(s,a)-|s′-s|)2]+0.0005‖β‖2
Wherein β is f a all parameters;
Using a gradient descent method, the optimizer uses Adam;
Further, step 5 uses a smart manual mechanical model f including a state increment direction probability sub-network model f d and a state increment amplification sub-network model f a to input the states s of the current smart hand and the object to be gripped and the smart hand joint input actions a into the inputs f d and f a to obtain a state increment direction probability value and a state increment value, thereby obtaining a state predicted value at the next moment Wherein dir-f d (s, a).
The dynamic model building and training method for the smart hand-held process based on the deep neural network has the following advantages: according to the invention, the deep neural network dynamics model comprising the state increment direction probability sub-network model and the state increment amplification sub-network model is designed, and the two sub-deep network models are used for respectively predicting the system state increment direction and the system state increment amplification, so that the accuracy of the dynamics model is improved. And meanwhile, the data samples are subjected to fuzzy clustering, and the training samples are preprocessed, so that the local overfitting phenomenon in the dynamic model training process is reduced. And further, the prediction error of the dynamic model is reduced, the stability is improved, and the control effect is improved on the control algorithm level.
Drawings
FIG. 1 is a block diagram of a dexterous hand grasping process in accordance with the present invention;
FIG. 2 is a fuzzy clustering flow chart in accordance with the present invention;
FIG. 3 is a diagram of the model structure of f d in the present invention;
FIG. 4 is a diagram of the model structure of f a in the present invention;
Fig. 5 is a diagram of a smart manual mechanical model f of the present invention using a frame.
Detailed Description
In order to better understand the purposes, structures and functions of the invention, the invention relates to a smart hand-held process dynamics model building and training method based on a deep neural network, which is further described in detail below with reference to the accompanying drawings.
A dynamic model construction and training method for a flexible hand holding process based on a deep neural network is characterized in that system state transfer data formed by a flexible hand and a held object are collected in a real environment, fuzzy clustering pretreatment is carried out on the system state transfer data, the pretreated data are sampled to obtain a dynamic model training sample, a flexible hand mechanical model is trained, and the dynamic model training method is used for predicting the state of the system formed by the flexible hand and the held object at the next moment.
The method comprises the following steps: (1) The smart hand uses SAC algorithm strategy pi θ to interact with the object to be grasped and train in the environment, collects system state transfer data and puts the system state transfer data into a sample buffer area; (2) Setting the category number of fuzzy clustering and carrying out fuzzy clustering on state transition data in a sample buffer area; (3) Building a smart manual mechanical model f comprising a state increment direction probability sub-network model f d and a state increment amplification sub-network model f a; (4) And generating sample sampling probability according to membership degree for each class of fuzzy clustering, and sampling to obtain training samples. (5) And training a smart manual mechanical model, and predicting the state of the environmental system.
The gripping process simulation of the smart hand and the gripped object is preferably performed by using MuJoCo physical simulation engines. The simulation environment can continuously generate the external force and torque noise which are distributed in Gaussian and are applied to the mass center of the gripped object and the torque of the smart hand joint, and random external force interference in a real scene is simulated. As the state of the smart hand and the gripped object changes over time within the simulator, the overall process conforms to a Markov Decision Process (MDP), which can be represented by the quintuple < S, a, P, R, γ >. Wherein S represents a system state space formed by the dexterous hand and the gripped object, A represents a dexterous hand joint action space, P represents a state transition probability, R represents a reward space, and gamma represents a reward discount coefficient.
The Actor network without the model reinforcement learning algorithm SAC is preferentially used as a smart hand control strategy pi θ, the system target g is set to be a gripping object to a random orientation, and if the gripped object falls, the simulation is regarded as ending and the simulation environment is reset. And recording state transition data (s, a, s ', r) of the dexterous hand and the gripped object in the simulator, wherein s is a system state at the current moment, a is a system input action at the current moment, s' is a system next moment, and r is a reward value calculated according to the gripping target. Storing state transition Data to obtain a Data set Data:
Data={(s1,a1,s2,r1),(s2,a2,s3,r2),...,(sn-1,an-1,sn,rn-1)}.
and training the Actor and Critic networks using Data.
Preferentially, fuzzy clustering is carried out on the Data set Data, and a fuzzy clustering center set C= { C 1,c2,...,ck } is randomly set, wherein the clustering center C contains the same number of elements as the system state s. Calculating the Euclidean distance d between each state s and each clustering center c in the Data set Data to obtain a distance matrixWherein d ij=‖si-cj II represents the Euclidean distance value of the ith state and the jth cluster center. The fuzzy clustering center set C is adjusted so that the sum of squares of the elements of the distance matrix D t is minimized. Calculating the membership degree u of the state s in the Data set Data to the clustering category to obtain a membership degree matrix/> Wherein/>Representing the membership of the ith state to the jth cluster category.
And preferentially sampling the dynamic model training samples of each cluster category. Calculating the sampling probability p of the Data set Data state s in each category according to the membership matrix U to obtain a probability matrixWherein the method comprises the steps ofRepresenting the probability that the ith state is sampled in the jth cluster class, if state s i is taken (s i,ai,s′i) is taken as a training sample.
The method comprises the steps of constructing a state increment direction probability sub-network model f d by using a Pytorch depth neural network frame preferentially, inputting states s and smart hand joint input actions a of a smart hand and a gripped object system by using the state increment direction probability sub-network model f a.fd and the state increment amplification sub-network model f d, and forming the state increment direction probability sub-network model by using three linear layers, two ReLu layers and two positive and negative polarity channel layers, wherein a Sigmoid layer is additionally arranged at the tail of the network f d. The outputs of f d and f d are the direction and absolute value of the system state change Δs, respectively.
Training f d, and setting a loss function as follows:
Jtrand(α)=E(s,a,s′)~Date(P)[(fd(s,a)-g(s′-s))2]+0.0005‖α‖2
Wherein the method comprises the steps of Alpha is f d all parameters.
Using the gradient descent method, the optimizer uses Adam.
Training f a, and setting a loss function as follows:
Jtrana(β)=E(s,a,s′)~Date(P)[(fa(s,a)-|s′-s|)2]+0.0005‖β‖2
where β is f a all parameters.
Using the gradient descent method, the optimizer uses Adam.
The dexterous manual mechanical model f comprising a state increment direction probability sub-network model f d and a state increment amplifying sub-network model f a is preferentially used, the state s of the current dexterous hand and the object to be gripped and the input action a of the dexterous hand joint are input into f d and f a, the state increment direction probability value and the state increment value are obtained, and the state predicted value at the next moment is obtainedWherein dir-f d (s, a).
The invention will be further elucidated with reference to specific examples.
The invention designs a dynamic model building and training method for a smart hand holding process based on a deep neural network, which is used for the intensive learning of holding objects, and the holding process is shown in a structural block diagram of figure 1.
Step 1: and constructing a simulation environment in the MuJoCo simulator according to the smart hand three-dimensional model, the three-dimensional model of the gripped object and the dynamic parameters. A smart hand joint driver, a joint angle sensor, an angular velocity sensor and a torque sensor, a smart finger tip touch sensor and a gripped object position sensor and a gripped object speed sensor are arranged. External force noise in Gaussian distribution is arranged and applied to unpredictable noise interference in real environments of the mass centers of the smart hand joints and the gripped object. The system state s comprises the articulation angle, the angular velocity, the torque, the position and the speed of the gripped object and the contact force of the smart finger tip. The system input action a contains the smart hand joint driver output value.
Step 2: using the strategy pi θ, generating a smart hand joint driver action a according to the current system state s in a MuJoCo simulation environment, and performing simulation to obtain the state s' of the system at the next moment, calculating to obtain the current rewarding value r according to the grasping target, setting the state transition probability P to be 1, namely setting the determined environment, and setting the rewarding discount coefficient gamma to be 0.99. Wherein the simulation time step is 0.02 seconds. MDP data is stored in a sample buffer. And training an Actor and Critic model according to the data in the sample buffer area, and adopting a SAC algorithm.
Step 3: and carrying out fuzzy clustering on the data in the sample cache region, firstly determining the number of category center points of the fuzzy clustering, and randomizing the category centers. And then, the Euclidean distance between the sample and the class center is calculated as shown in fig. 2, and the class center is updated according to the square value of the absolute total Euclidean distance until convergence. And finally, calculating the membership degree of the corresponding category according to the Euclidean distance between the sample and the category center, and then obtaining the probability of the sample being sampled in the category according to the membership degree.
Step 4: and constructing a state increment direction probability sub-network model f d by using a Pytorch depth neural network framework, wherein the structure of the state increment direction probability sub-network model f a.fd is shown in fig. 3, and the structure of the state increment amplification sub-network model f a is shown in fig. 4. The input of f d and f a comprises the state s of the system of the dexterous hand and the object to be gripped and the input action a of the dexterous hand joint, and consists of three linear layers, two ReLu layers and two positive and negative polarity channel layers, and f d is additionally provided with a Sigmoid layer at the tail part of the network. The outputs of f d and f a are the direction and absolute value of the system state change Δs, respectively. In step 3, training samples are sampled for each category to obtain training samples, and f d and f a are trained respectively.
Step 5: a smart manual mechanical model f is used that includes a state delta direction probability subnetwork model f d and a state delta amplifying subnetwork model f a. As shown in fig. 5, the state s is obtained by sampling the sample buffer, and then the action a is generated according to the current policy pi θ. Inputting the sampled state s and action a into f d and f a to obtain a state increment direction probability value and a state increment value, and further obtaining a state predicted value at the next momentWherein dir-f d (s, a). And calculating to obtain the rewarding value/>, according to the set grasping targetUse/>Training the Actor and Critic networks.
It will be understood that the application has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the application. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the application without departing from the essential scope thereof. Therefore, it is intended that the application not be limited to the particular embodiment disclosed, but that the application will include all embodiments falling within the scope of the appended claims.

Claims (1)

1. The utility model relates to a deep neural network-based smart hand-held process dynamics model building and training method, which is characterized by comprising the following steps:
step 1: the smart hand uses SAC algorithm strategy pi θ to interact with the object to be grasped and train in the environment, collects system state transfer data and puts the system state transfer data into a sample buffer area;
Using MuJoCo physical simulation engine to simulate the holding process of the smart hand and the held object, continuously generating Gaussian external force and torque noise in the simulation environment, and applying the Gaussian external force and torque noise on the mass center of the held object and the torque of the joints of the smart hand so as to simulate random external force interference in a real scene; along with the internal time transition of the simulator, the states of the smart hand and the gripped object change, the whole process accords with a Markov decision process, and the five-tuple < S, A, P, R and gamma > is used for representing, wherein S represents a system state space formed by the smart hand and the gripped object, A represents a smart hand joint action space, P represents a state transition probability, R represents a rewarding space, and gamma represents a rewarding discount coefficient;
Using an Actor network without a model reinforcement learning algorithm SAC as a smart hand control strategy pi θ, setting a system target g as a gripped object to a random orientation, if the gripped object falls, treating as the end of the simulation and resetting a simulation environment, recording state transition Data (s, a, s ', r) of the smart hand and the gripped object in a simulator, wherein s is a current moment system state, a is a current moment system input action, s' is a next moment system state, r is a reward value calculated according to the gripped object, and storing the state transition Data to obtain a Data set Data:
Data={(s1,a1,s2,r1),(s2,a2,s3,r2),…,(sn-1,an-1,sn,rn-1)};
training an Actor and a Critic network by using Data;
step 2: setting the category number of fuzzy clustering and carrying out fuzzy clustering on state transition data in a sample buffer area;
Carrying out fuzzy clustering on the Data set Data, and randomly setting a fuzzy clustering center set C= { C 1,c2,…,ck }, wherein the clustering center C contains the same number of elements as the system state s; calculating the Euclidean distance d between each state s and each clustering center c in the Data set Data to obtain a distance matrix Wherein d ij=‖si-cj II represents the Euclidean distance value between the ith state and the jth cluster center; adjusting the fuzzy clustering center set C to enable the sum of squares of all elements of the distance matrix D t to be minimum; calculating the membership degree u of the state s in the Data set Data to the clustering category to obtain a membership degree matrix/>Wherein the method comprises the steps ofRepresenting the membership of the ith state to the jth cluster class;
Step 3: building a smart manual mechanical model f comprising a state increment direction probability sub-network model f d and a state increment amplification sub-network model f a;
Setting up a state increment direction probability sub-network model f d by using Pytorch depth neural network frames, wherein the inputs of the state increment direction probability sub-network model f a;fd and f a comprise states s of a dexterous hand and a gripped object system and dexterous hand joint input actions a, the state increment direction probability sub-network model is composed of three linear layers, two ReLu layers and two positive and negative polarity channel layers, and a Sigmoid layer is additionally arranged at the tail part of the network f d; the outputs of f d and f a are the direction and absolute value of the system state change Δs;
step 4: generating sample sampling probability according to membership degree for each class of fuzzy clustering, and sampling to obtain training samples;
sampling a dynamic model training sample for each cluster category; calculating the sampling probability p of the Data set Data state s in each category according to the membership matrix U to obtain a probability matrix Wherein/>Representing the probability that the ith state is sampled in the jth cluster class, if state s i is sampled (s i,ai,s′i) as a training sample;
step 5: training a smart manual mechanical model, and predicting the state of an environmental system;
training f d, and setting a loss function as follows:
Jtrand(α)=E(s,a,s′)~Date(P)[(fd(s,a)-g(s′-s))2]+0.0005‖α‖2
Wherein the method comprises the steps of Alpha is f d all parameters;
Using a gradient descent method, the optimizer uses Adam;
Training f a, and setting a loss function as follows:
Jtrana(β)=E(s,a,s′)~Date(P)[(fa(s,a)-|s′-s|)2]+0.0005‖β‖2
Wherein β is f a all parameters;
Using a gradient descent method, the optimizer uses Adam;
using a dexterous manual mechanical model f comprising a state increment direction probability sub-network model f d and a state increment amplification sub-network model f, inputting the states s of the current dexterous hand and the object to be gripped and the dexterous hand joint input actions a into f d and f a to obtain a state increment direction probability value and a state increment amplitude value, thereby obtaining a state predicted value at the next moment Wherein dir-f d (s, a).
CN202210017100.XA 2022-01-07 2022-01-07 Deep neural network-based smart hand-held process dynamics model building and training method Active CN114415507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210017100.XA CN114415507B (en) 2022-01-07 2022-01-07 Deep neural network-based smart hand-held process dynamics model building and training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210017100.XA CN114415507B (en) 2022-01-07 2022-01-07 Deep neural network-based smart hand-held process dynamics model building and training method

Publications (2)

Publication Number Publication Date
CN114415507A CN114415507A (en) 2022-04-29
CN114415507B true CN114415507B (en) 2024-05-28

Family

ID=81272280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210017100.XA Active CN114415507B (en) 2022-01-07 2022-01-07 Deep neural network-based smart hand-held process dynamics model building and training method

Country Status (1)

Country Link
CN (1) CN114415507B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116088307B (en) * 2022-12-28 2024-01-30 中南大学 Multi-working-condition industrial process prediction control method, device, equipment and medium based on error triggering self-adaptive sparse identification
CN115816466B (en) * 2023-02-02 2023-06-16 中国科学技术大学 Method for improving control stability of vision observation robot

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101317794A (en) * 2008-03-11 2008-12-10 清华大学 Myoelectric control ability detecting and training method for hand-prosthesis with multiple fingers and multiple degrees of freedom
CN103592932A (en) * 2013-12-02 2014-02-19 哈尔滨工业大学 Modularized embedded control system for multi-finger myoelectric artificial hand with various sensing functions
CN107030694A (en) * 2017-04-20 2017-08-11 南京航空航天大学 Tendon drives manipulator tendon tension restriction end power bit manipulation control method and device
CN109657706A (en) * 2018-12-03 2019-04-19 浙江工业大学 Flexible part assembling process contact condition recognition methods based on gauss hybrid models bayesian algorithm
CN110298886A (en) * 2019-07-01 2019-10-01 中国科学技术大学 A kind of Dextrous Hand Grasp Planning method based on level Four convolutional neural networks
CN112668190A (en) * 2020-12-30 2021-04-16 长安大学 Method, system, equipment and storage medium for constructing three-finger smart hand controller
CN113657533A (en) * 2021-08-24 2021-11-16 河海大学 Multi-element time sequence segmentation clustering method for space-time scene construction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101317794A (en) * 2008-03-11 2008-12-10 清华大学 Myoelectric control ability detecting and training method for hand-prosthesis with multiple fingers and multiple degrees of freedom
CN103592932A (en) * 2013-12-02 2014-02-19 哈尔滨工业大学 Modularized embedded control system for multi-finger myoelectric artificial hand with various sensing functions
CN107030694A (en) * 2017-04-20 2017-08-11 南京航空航天大学 Tendon drives manipulator tendon tension restriction end power bit manipulation control method and device
CN109657706A (en) * 2018-12-03 2019-04-19 浙江工业大学 Flexible part assembling process contact condition recognition methods based on gauss hybrid models bayesian algorithm
CN110298886A (en) * 2019-07-01 2019-10-01 中国科学技术大学 A kind of Dextrous Hand Grasp Planning method based on level Four convolutional neural networks
CN112668190A (en) * 2020-12-30 2021-04-16 长安大学 Method, system, equipment and storage medium for constructing three-finger smart hand controller
CN113657533A (en) * 2021-08-24 2021-11-16 河海大学 Multi-element time sequence segmentation clustering method for space-time scene construction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Neural and genetic basis of dexterous hand movements";Yutaka Yoshida 等;《Elsevier Ltd》;20180424;全文 *
"机器人多指灵巧手的研究现状、趋势与挑战";蔡世波 等;《机械工程学报》;20210831;第57卷(第15期);第1-14页 *
"面向软体多指手的指尖接触力学建模";张凌峰 等;《高技术通讯》;20200430;第30卷(第4期);第391-401 *

Also Published As

Publication number Publication date
CN114415507A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN110262511B (en) Biped robot adaptive walking control method based on deep reinforcement learning
CN112668235B (en) Robot control method based on off-line model pre-training learning DDPG algorithm
CN114415507B (en) Deep neural network-based smart hand-held process dynamics model building and training method
Muratore et al. Data-efficient domain randomization with bayesian optimization
Grzeszczuk et al. Neuroanimator: Fast neural network emulation and control of physics-based models
CN110991027A (en) Robot simulation learning method based on virtual scene training
CN109702740B (en) Robot compliance control method, device, equipment and storage medium
Yao et al. Direct policy transfer via hidden parameter markov decision processes
CN110442129B (en) Control method and system for multi-agent formation
CN111260027A (en) Intelligent agent automatic decision-making method based on reinforcement learning
WO2020118730A1 (en) Compliance control method and apparatus for robot, device, and storage medium
Kebria et al. Deep imitation learning: The impact of depth on policy performance
Belmonte-Baeza et al. Meta reinforcement learning for optimal design of legged robots
CN113281999A (en) Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning
CN116027669A (en) Self-adaptive sliding mode control method and system for high-speed train and electronic equipment
Torabi et al. Sample-efficient adversarial imitation learning from observation
Ma et al. An efficient robot precision assembly skill learning framework based on several demonstrations
Hilleli et al. Toward deep reinforcement learning without a simulator: An autonomous steering example
Brosseit et al. Distilled domain randomization
CN115366099A (en) Mechanical arm depth certainty strategy gradient training method based on forward kinematics
Hachiya et al. Efficient sample reuse in EM-based policy search
CN114386620A (en) Offline multi-agent reinforcement learning method based on action constraint
El-Fakdi et al. Autonomous underwater vehicle control using reinforcement learning policy search methods
Zhang et al. Tracking control for mobile robot based on deep reinforcement learning
Muratore Randomizing physics simulations for robot learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant